Auto-tools/Projects/OrangeFactor/2010-11-03: Difference between revisions

Auto-tools/Projects/OrangeFactor/2010-11-03 (view source)

Revision as of 23:46, 3 November 2010

1,093 bytes added , 3 November 2010

→‎Log processing

DEinspanjer

131

edits

@@ Line 7: / Line 7: @@
 A setup for handling the logs at different stages of processing was proposed as follows:
-- a script saves new buildbot logs to a directory, say 'incoming'
+- a script saves new buildbot logs to a directory, say 'incoming_sendjson'
-- another script watches for files to appear in 'incoming', then copies them to another directory called 'parse'.  The log parser is invoked for this log (by our script? or flume?), and the output sent to stdout, which somehow flume will be watching, and which it will transport to the Hive DB.  Finally, the log is moved to another directory, 'sendlog'.
+- A Flume pipeline uses one of the exec sources [http://archive.cloudera.com/cdh/3/flume/UserGuide.html#_flume_source_catalog] to execute a script that invokes the log parser script on batches of buildbot logs sitting in the 'incoming_sendjson' directory.  For each one of those files, the script will emit single line JSON objects to stdout.  When the Flume agent invokes this script, it will store each line of stdout into the Hive table for JSON results.  The script should move each successfully processed buildbot log file from 'incoming_sendjson' to 'incoming_sendlog'
-- another script watches for files to appear in 'sendlog', and outputs the entire (uncompressed) log to stdout, with the following format (' \t ' represents a tab in the output):
+- Another Flume pipeline will exec a separate script that will process batches of buildlogs sitting in the 'incoming_sendlog'.  This script should cat each log file, prepending tab separated metadata in the following format (' \t ' represents a tab in the output):
+  repo \t platform \t debug-or-opt \t builddate \t testsuite \t line-num \t single-log-line
+'''Example buildbot log input'''
+  mozilla-central-macosx64-debug/1288731175/mozilla-central_snowleopard-debug_test-crashtest-build94.txt.gz: line 1
+  mozilla-central-macosx64-debug/1288731175/mozilla-central_snowleopard-debug_test-crashtest-build94.txt.gz: line 2
+  mozilla-central-macosx64-debug/1288731175/mozilla-central_snowleopard-debug_test-crashtest-build94.txt.gz: line 3
+  mozilla-central-macosx64-release/1288731175/mozilla-central_snowleopard-debug_test-crashtest-build94.txt.gz: line 1
+  mozilla-central-macosx64-release/1288731175/mozilla-central_snowleopard-debug_test-crashtest-build94.txt.gz: line 2
+  mozilla-central-macosx64-release/1288731175/mozilla-central_snowleopard-debug_test-crashtest-build94.txt.gz: line 3
+'''Example sendlog script stdout output'''
+  mozilla-central \t macosx64 \t debug \t 1288731175 \t crashtest \t 1 \t line 1
+  mozilla-central \t macosx64 \t debug \t 1288731175 \t crashtest \t 2 \t line 2
+  mozilla-central \t macosx64 \t debug \t 1288731175 \t crashtest \t 3 \t line 3
+  mozilla-central \t macosx64 \t release \t 1288731175 \t crashtest \t 1 \t line 1
+  mozilla-central \t macosx64 \t release \t 1288731175 \t crashtest \t 2 \t line 2
+  mozilla-central \t macosx64 \t release \t 1288731175 \t crashtest \t 3 \t line 3
-repo \t platform \t debug-or-opt \t builddate \t testsuite \t [loglines]
+After catting, each log is moved to a 'processed' directory.
-where [loglines] is a JSON array of log lines (one log line per element in the array)
-Afterwards, the log is moved to a 'processed' directory.
-The 'repo', 'platform', 'debug|opt', 'builddate' and 'suitename' properties above are generated from the original location of the log on http://stage.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds, e.g., the output for
-mozilla-central-macosx64-debug/1288731175/mozilla-central_snowleopard-debug_test-crashtest-build94.txt.gz
-would be:
-mozilla-central     macosx64     debug     1288731175    crashtest     [loglines]
 Each of the these properties will wind up being columns in the logdata database, so we should be able associate these logs with the parsed log data.  If the parsed log data includes indexes into these logs, then we'll be able to retrieve a portion of the log that represents the execution of any individual test.