QA/Automation/Projects/Mozmill Automation/Mozmill CI/Duties: Difference between revisions

m
no edit summary
No edit summary
mNo edit summary
 
Line 11: Line 11:
This task should be done in the morning first thing and before leaving the office.
This task should be done in the morning first thing and before leaving the office.


Usually, all our nodes should be up and running. There are exceptions when we put and keep one offline because it has issues that were not fixed yet or for testing/updates purposes, which is a short period of time. All these cases should be explained with an offline status message when taking a node offline.  
Usually, all our nodes should be up and running. There are exceptions when we put and keep one offline because it has issues that were not fixed yet or for testing/updates purposes, which is a short period of time.
 
All these cases should be explained with an offline status message when taking a node offline.  


This has the format: [Username] Reason the node is offline
This has the format: [Username] Reason the node is offline
Line 17: Line 19:
Example: [Andreea] Updating Flash / Testing bug 123456
Example: [Andreea] Updating Flash / Testing bug 123456


So when you scroll over the list of nodes, if you see one offline, click on it and check the message. If it's something important, leave it like that. You could even check with the person that took it offline, they might have forgot to put it back online. Anyway, testing purposes should only be done during the day, at the end of the day all machines should be back online.  
So when you scroll over the list of nodes, if you see one offline, click on it and check the message. If it's something important, leave it like that.  
 
You could even check with the person that took it offline, they might have forgot to put it back online.  
 
Anyway, testing purposes should only be done during the day, at the end of the day all machines should be back online.  


== How to put a node back online ==
== How to put a node back online ==
You should check the machine via ssh and see if it has a screen session (screen -x) and if it shows 'Connected'. That means it's connected with java, but it's only marked offline in Jenkins. If all is fine on the machine, you just click Mark this node online.
You should check the machine via ssh and see if it has a screen session (screen -x) and if it shows 'Connected'. That means it's connected with java, but it's only marked offline in Jenkins.  
 
If all is fine on the machine, you just click Mark this node online.


If the machine is not connected, then you need to follow these instructions from mana.
If the machine is not connected, then you need to follow these instructions from mana.


== Check Jenkins Health ==
== Check Jenkins Health ==
* Monitoring:
=== Monitoring===
** Address: mm-ci-production.qa.scl3.mozilla.com:8080/monitoring  |  mm-ci-staging.qa.scl3.mozilla.com:8080/monitoring
* Address: mm-ci-production.qa.scl3.mozilla.com:8080/monitoring  |  mm-ci-staging.qa.scl3.mozilla.com:8080/monitoring
You need to go through the panels and make sure we're in normal values - usually marked with green. If something is red, please let us know on the mailing list, file an issue on mozmill-ci or a bug under Infrastructure (depending on the issue), write in the etherpad and you can also contact people on IRC.
You need to go through the panels and make sure we're in normal values - usually marked with green. If something is red, please let us know on the mailing list, file an issue on mozmill-ci or a bug under Infrastructure (depending on the issue), write in the etherpad and you can also contact people on IRC.


 
===Pulse===
* Pulse:
Connect to master machine and check in screen (screen -x) that pulse is connected, on all panels (daily, l10n, release) - switch between them with CTRL + A + number to switch to (1, 2, 3.. )
Connect to master machine and check in screen (screen -x) that pulse is connected, on all panels (daily, l10n, release) - switch between them with CTRL + A + number to switch to (1, 2, 3.. )


Line 38: Line 45:
If the job doesn't have a Mozmill Dashboard link, then the testrun did not complete successfully. It either ran all tests but failed to send the report or in the other steps we're doing at the end. To identify the problem, you need to go on the specific log link (from the "View the build in Jenkins") and check the issue.  
If the job doesn't have a Mozmill Dashboard link, then the testrun did not complete successfully. It either ran all tests but failed to send the report or in the other steps we're doing at the end. To identify the problem, you need to go on the specific log link (from the "View the build in Jenkins") and check the issue.  


Known issues:
* Known issues:
* Bug 915563 - Testrun stopped with "socket.error: [Errno 10022] WSAEINVAL"
** Bug 915563 - Testrun stopped with "socket.error: [Errno 10022] WSAEINVAL"
* Cannot delete workspace problem on Windows: https://github.com/mozilla/mozmill-ci/issues/358
** Cannot delete workspace problem on Windows: https://github.com/mozilla/mozmill-ci/issues/358


For aborted emails, these have the following subject:
For aborted emails, these have the following subject:
Confirmed users
571

edits