8
edits
m (→Asynchronous) |
m (→Assumptions) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 17: | Line 17: | ||
It is assumed that, regardless of how each data type is collected (parsing httpd logs, collating data from temporary tables, or storing real-time data in the database), we can expect '''all''' data in the previous section to be available reliably and accurately during a predefined 30 minute window within every hour in the database. (Note: we don't care about DST quirks, shifts and jumps here, we're only talking about whole hours.) | It is assumed that, regardless of how each data type is collected (parsing httpd logs, collating data from temporary tables, or storing real-time data in the database), we can expect '''all''' data in the previous section to be available reliably and accurately during a predefined 30 minute window within every hour in the database. (Note: we don't care about DST quirks, shifts and jumps here, we're only talking about whole hours.) | ||
It is assumed that we have enough CPU power and disk speed to update all RRD files associated to all add-ons during the 30-minute window, as long as we distribute the load across that window, as to avoid hampering other routine activities on the server. Updating one RRD is neither CPU- | It is assumed that we have enough CPU power and disk speed to update all RRD files associated to all add-ons during the 30-minute window, as long as we distribute the load across that window, as to avoid hampering other routine activities on the server. Updating one RRD is neither CPU-, nor HDD-intensive, but depending on server load and performance, this might be a factor when updating thousands of RRD files. To be evaluated. | ||
It is assumed that losing statistical data older than a few weeks at one-hour resolution, and data older than one or two years at one-day resolution is planned for and accepted. | It is assumed that losing statistical data older than a few weeks at one-hour resolution, and data older than one or two years at one-day resolution is planned for and accepted. | ||
Line 55: | Line 55: | ||
The size of the source RRA file on Oct 14, 2006 was less than 14 KB. | The size of the source RRA file on Oct 14, 2006 was less than 14 KB. | ||
'''Note:''' I'm fiddling with the example page, trying to find the best aggregation variations. The basic data sources (total dl, weekly dl, rating) and granularity (1h) stay constant, I'm only working on aggregation periods, graph number and graph sizes. Therefore those specific elements may differ in the actual example from what's described here. | |||
==Implementation== | ==Implementation== | ||
Line 80: | Line 82: | ||
#* If the witness file's timestamp is newer than the RRA, exit; | #* If the witness file's timestamp is newer than the RRA, exit; | ||
#* If the witness file's timestamp is older than the RRA, the AGRP generates new graphs based on the RRA and then touches<sup>3</sup> the witness file; | #* If the witness file's timestamp is older than the RRA, the AGRP generates new graphs based on the RRA and then touches<sup>3</sup> the witness file; | ||
# The script at (1) proceeds, serving the graph images, as available on the storage | # The script at (1) proceeds, serving the graph images, as available on the storage device. | ||
:<small><sup>1</sup>: "Blocking" in that subsequent code is not executed until the external script is finished.</small> | :<small><sup>1</sup>: "Blocking" in that subsequent code is not executed until the external script is finished.</small> | ||
:<small><sup>2</sup>: Future developments may change the currently proposed set of graphs, and/or the order in which they are generated — therefore it's good practice to avoid checking against the timestamp of a particular graph file; an independent witness file can be guaranteed to be available and have a meaningful timestamp regardless of any graphs being generated, or their order.</small> | :<small><sup>2</sup>: Future developments may change the currently proposed set of graphs, and/or the order in which they are generated — therefore it's good practice to avoid checking against the timestamp of a particular graph file; an independent witness file can be guaranteed to be available and have a meaningful timestamp regardless of any graphs being generated, or their order.</small> |
edits