ReleaseEngineering/How To/Self Provision a TaskCluster Windows Instance
For generic-worker 10.5.0 onwards
Integration with taskcluster one-click loaner workflow will be done in bug 1368961.
Using this method, you can get an interactive RDP session to a live worker for 12 hours. After 12 hours, the worker will be killed. Please note, AWS may issue a Spot Termination at any time, so if you are very unlucky, the worker may die before the 12 hours are over.
1) Find a the task you want to play with in treeherder, and follow the link to the Taskcluster Task Inspector.
2) Go to Actions -> Edit Task.
3) Add rdpInfo
to the payload section:
payload: rdpInfo: 'login-identity/<login-identity>/rdpinfo.txt'
For example:
payload: rdpInfo: 'login-identity/mozilla-auth0/ad|Mozilla-LDAP|pmoore/rdpinfo.txt'
(check https://tools.taskcluster.net/credentials to see what your login identity is, e.g. you should have the scope assume:login-identity:<login-identity>
).
4) Check which workerType
the task uses from the task definition, and then add the following scope to the list of task scopes:
scopes: - 'generic-worker:allow-rdp:aws-provisioner-v1/<workerType>'
For example:
scopes: - 'generic-worker:allow-rdp:aws-provisioner-v1/gecko-t-win7-32'
5) If you will require administrator privileges:
5.1) Ensure you have scope generic-worker:os-group:Administrators
in https://tools.taskcluster.net/credentials/
If you do not have the scope, request it using a Service Request or by asking in #taskcluster
IRC channel.
5.2) Add the following to the task payload:
scopes: - generic-worker:os-group:Administrators payload: osGroups: - Administrators
6) Run the task, and when it starts, go to Run Artifacts to see the rdpInfo.txt
file appear with rdp connection information.
7) Enter the connection information into your RDP client of choice.
8) Connect with screen resolution 1280x1024 ! Note, it is important to use this resolution for gecko tests, since this is the screen size used by the tests, and the screen size cannot change once you have made a connection.
Performing operations as Administrator
Until bug 1465374 is resolved this can be a little tricky.
1) Make sure you followed step 5 above!
2) Open a regular command shell (e.g. Start Menu -> Run -> cmd.exe
)
3) From there check which users are in the Administrators group:
C:\Windows\System32>net localgroup Administrators Alias name Administrators Comment Administrators have complete and unrestricted access to the computer/domain Members ------------------------------------------------------------------------------- Administrator task_1527672240 The command completed successfully.
4) Check the user you are logged in as is one of the above listed users:
C:\Windows\System32>whoami i-015fe55bb8553\task_1527672240
5) Open a new UAC elevated command shell:
C:\Windows\System32>powershell.exe Start-Process cmd.exe -Verb runAs
6) This will require you enter Administrative credentials. You will be presented with a prompt similar to this:
Click on More choices and select the task user, and copy/paste the task password (Ctrl-V
).
You should now have a command shell running as Administrator!
Rerunning tasks after they have completed
If you want to rerun a step after the task has completed, there is a script for each command in the task payload. They are named like this:
Z:\task_XXXXXXX\command_000000_wrapper.bat Z:\task_XXXXXXX\command_000001_wrapper.bat Z:\task_XXXXXXX\command_000002_wrapper.bat ....
Simply run them by hand to reproduce any of the task steps. If you wish to make changes, feel free to edit them. After the loan expires, the task directory, including all your changes, will be deleted. Therefore make sure to keep track of any changes that need to be put back in the gecko build system.
For earlier versions of generic-worker (<10.5.0)
How does it work?
TaskCluster Windows instances can be borrowed for debugging at will. There is no need to raise a bug. The process is triggered by running a special task that tells the instance you intend to commandeer it.
When you trigger this task, a recurring scheduled job on the instance will:
- perform a little cleanup, deleting data and configuration that is only pertinent for automation
- change the instance credentials, so that only you will have access to the instance and so that the instance credentials can be shared with you
- place encrypted credentials in the task artifacts for you
- delete the generic-worker and associated user and configuration
How is it triggered?
Visit https://tools.taskcluster.net/task-creator and create a new task that looks something like below.
provisionerId: aws-provisioner-v1 workerType: gecko-t-win7-32 retries: 0 created: '2017-05-24T12:05:16.556Z' deadline: '2017-05-24T13:05:16.556Z' expires: '2018-05-24T09:08:08.128Z' scopes: [] payload: maxRunTime: 300 artifacts: - path: ..\loan name: public/test_info type: directory expires: '2018-05-24T08:49:41.238Z' env: REQUESTER_EMAIL: grenade@mozilla.com REQUESTER_PUBLIC_KEY_URL: https://keybase.io/grenade/pgp_keys.asc?fingerprint=1c09ac24c113c7f080dd4aa5b3c5a958508a43f2 command: - >- c:\mozilla-build\python\python.exe -c "import os, json; json.dump({'requester': { 'email': os.environ['REQUESTER_EMAIL'], 'publickeyurl': os.environ['REQUESTER_PUBLIC_KEY_URL'], 'taskid': os.environ['TASK_ID'], 'taskFolder': os.getcwd()}}, open(r'z:\loan-request.json', 'wb'))" - >- c:\mozilla-build\python\python.exe -c "exec(\"import os.path, time\nwhile (not os.path.exists(r'z:\loan\credentials.txt.gpg')):\n time.sleep(1)\"") metadata: name: Windows Loan Request description: Self service windows instance loan request owner: grenade@mozilla.com source: https://wiki.mozilla.org/index.php?title=ReleaseEngineering/How_To/Self_Provision_a_TaskCluster_Windows_Instance tags: {} extra: {}
- Change workerType to one of:
- gecko-1-b-win2012: Windows Server 2012 r2 - Use for debugging Build issues
- gecko-t-win10-64: Windows 10 Enterprise, x86_64 - Use for debugging Test issues
- gecko-t-win10-64-gpu: Windows 10 Enterprise, x86_64 - Use for debugging Test issues where a GPU is required
- gecko-t-win7-32: Windows 7 Enterprise, x86 - Use for debugging Test issues
- gecko-t-win7-32-gpu: Windows 7 Enterprise, x86 - Use for debugging Test issues where a GPU is required
- Change REQUESTER_EMAIL to the email address associated with your GPG public key
- Change REQUESTER_PUBLIC_KEY_URL to a url that will return a plain text (raw) copy of you GPG public key.
- Change owner to your own email address
- Click the Update Timestamps button at the bottom
- Click the Create Task button at the bottom
- When the task completes, navigate to the artifacts tab (select run 0 first) and download the public/test_info/credentials.txt.gpg artifact to your computer.
- Decrypt it with a command similar to:
gpg2 --decrypt credentials.txt.gpg
How do I connect to my borrowed instance?
Remote Desktop is the only currently supported mechanism. Further, the processes that manage cleanup of abandoned or unused loaner instances check for RDP sessions to determine if the instance is actually in use or if it has been abandoned.
Connecting from Windows
- GUI: Start > Remote Desktop Connection > enter the IP Address and credentials from the decrypted credentials.txt file and connect
- Command line:
mstsc /w:1024 /h:768 /v:<ip_address>
Connecting from Linux
Fedora
# install xfreerdp sudo dnf install -y xfreerdp
# if you have an American English (en-US) keyboard: xfreerdp /u:$username /p:"$password" /kbd:409 /w:1024 /h:768 +clipboard /v:$ip_address
# if you have a Queen's English (en-GB) keyboard: xfreerdp /u:$username /p:"$password" /kbd:809 /w:1024 /h:768 +clipboard /v:$ip_address
For other keyboard layouts, see: https://github.com/FreeRDP/FreeRDP/blob/master/include/freerdp/locale/keyboard.h
Connecting from Mac OSX
Untested. Please edit this section if you know more.
open rdp://$username:$password@$ip_address:3389?screendepth###24:screenWidth###1024:screenHeight###768
see also: https://apple.stackexchange.com/a/54925/112073
Microsoft also has a Remote Desktop app for macOS (available on the app store at https://itunes.apple.com/us/app/microsoft-remote-desktop-10/id1295203466?mt=12) that works pretty well and is easy to use.
How do I return it when I'm done?
There's no need. Your loaned instance is an Amazon EC2 spot instance whose lifetime is finite. It can even die while it's in the middle of working for you, if the spot price is outbid. Think of it as a disposable minion whose survival depends on being busy and who will expire when exhausted, bored, or will defect to a higher-paying villain at any time. If you don't create an RDP connection to your instance within 30 minutes of requesting it, it will die of boredom. If you disconnect your RDP session for more than 15 minutes, it will die of boredom. If you shut it down. It will die fulfilled. When working on the loaner, it would be advisable to save your intermediate results periodically and script your actions, so that if the loaner disappears you can quickly get back to where you were on a new loaner.
How do I get help if it's not working?
- raise a bug in the RelOps component. cc or assign rthijssen@mozilla.com
- ping :grenade in #taskcluster on IRC (during normal working hours: 9 - 5 GMT+3)