Docker Worker

Docker task host for linux.

Each task is evaluated in an restricted docker container. Docker has a bunch of awesome utilities for making this work well... Since the images are COW running any number of task hosts is plausible and we can manage their overall usage.

We manipulate the docker hosts through the use of the docker remote api

See the doc site for how to use the worker from an existing worker-type the docs here are for hacking on the worker itself.

Requirements

  • Node >= 8.5.0
  • Docker
  • Packer (to build AMI)
    • Note that cracklib also hsa a utility named packer - run packer version to check!

Usage

# from the root of this repo) also see --help
node bin/worker.js <config>

Configuration

The defaults contains all configuration options for the docker worker in particular these are important:

  • taskcluster the credentials needed to authenticate all pull jobs from taskcluster.

  • pulse the credentials for listening to pulse exchanges.

Directory Structure

Environment

docker-worker runs in an Ubuntu environment with various packages and kernel modules installed.

Within the root of the repo is a Vagrantfile and vagrant.sh script that simplifies creating a local environment that mimics the one uses in production. This environment allows one to not only run the worker tests but also to run images used in TaskCluster in an environment similar to production without needing to configure special things on the host.

Loopback Devices

The v4l2loopback and snd-aloop kernel modules are installed to allow loopback audio/video devices to be available within tasks that require them. For information on how to configure these modules like production, consult the vagrant script used for creating a local environment.

Running tests

There are a few components that must be configured for the tests to work properly (e.g. docker, kernel modules, and other packages). A Vagrant environment is available to make this easy to use. Alternatively, it is possible to run tests outside of Vagrant. But this requires a bit more effort.

  1. Install VirtualBox
  2. Install Vagrant
  3. Install vagrant-reload by running vagrant plugin install vagrant-reload
  4. Within the root of the repo, run vagrant up
  5. vagrant ssh to enter the virtual machine

Setting up a standalone vm (non-Vagrant users)

If you can't use Vagrant (e.g. you are using Hyper-V and can't use Virtualbox), it is possible to configure a bare virtual machine in a very similar manner to what Vagrant would produce.

  1. Create a new virtual machine.
  2. Download and boot an Ubuntu 14.04 server ISO
  3. Boot the VM
  4. Click through the Ubuntu installer dialogs
  5. For the primary username, use vagrant
  6. All other settings can pretty much be the defaults. You'll just press ENTER a bunch of times during the install wizard. Although you'll probably want to install OpenSSH server on the Software selection screen so you can SSH into your VM.
  7. On first boot, run sudo visudo and modify the end of the %sudo line so it contains NOPASSWD:ALL instead of just ALL. This allows you to sudo without typing a password.
  8. apt-get install git
  9. git clone https://github.com/taskcluster/docker-worker ~/docker-worker
  10. sudo ln -s /home/vagrant/docker-worker /vagrant
  11. sudo ln -s /home/vagrant/docker-worker /worker
  12. cd docker-worker
  13. ./vagrant.sh -- this will provision the VM by installing a bunch of packages and dependencies.
  14. sudo reboot -- this is necessary to activate the updated kernel.
  15. sudo depmod

Logging into virtual machine and configuring environment

Many tests require the TASKCLUSTER_ACCESS_TOKEN, TASKCLUSTER_CLIENT_ID, PULSE_USERNAME, and PULSE_PASSWORD environment variables. These variables define credentials used to connect to external services.

To obtain Taskcluster client credentials, run eval $(cat scopes.txt | xargs taskcluster-cli signin). This will open a web browser and you'll be prompted to log into Taskcluster. This command requires the taskcluster-cli Go application. Find one at https://github.com/taskcluster/taskcluster-cli/releases.

Pulse credentials can be created at https://pulseguardian.mozilla.org/.

If using Vagrant, setting these environment variables in the shell used to run vagrant ssh will cause the variables to get inherited inside the Vagrant VM. If not using Vagrant, you should add export VAR=value lines to /home/vagrant/.bash_profile.

From the virtual machine, you'll need to install some application-level dependencies:

  1. cd /vagrant
  2. ./build.sh -- builds some Docker images
  3. yarn install --frozen-lockfile -- installs Node modules

Running Tests

  1. Either all the tests can be run, but running yarn test or ./test/test.sh, however, under most circumstances one only wants to run a single test suite
  2. For individual test files, run ./node_modules/mocha/bin/mocha --bail .test/<file>
  3. For running tests within a test file, add "--grep " when running the above command to capture just the individual test name.

Note: Sometimes things don't go as planned and tests will hang until they timeout. To get more insight into what went wrong, set "DEBUG=" when running the tests to get more detailed output. *

Common problems

  • Time synchronization : if you're running docker in a VM your VM may drift in time... This often results in stale warnings on the queue.

Updating Documentation

Documentation for this project lives under docs/ . Upon merging, documentation will be uploaded to s3 to display on docs.taskcluster.net automatically.

Deployment

The below is a detailed guide to how deployment works if you know what you're doing and just need a check list see: deployment check list

Requirements

  • packer
  • make
  • node >= 8.5.0
  • credentials for required services

Amazon Credentials

docker-worker is currently deployed to AWS EC2. Using packer to configure and deploy an AMI requires Amazon credentials to be specified. Follow this document to configure the environment appropriately.

Deploying

You will need passwordstore set up, and the aws CLI tool configured. Then run ./deploy.sh. It should do the rest!

Building AMI's

The docker worker deploy script is essentially a wrapper around packer with an interactive configuration script to ensure you're not missing particular environment variables. There are two primary workflows that are important.

  1. Building the base AMI. Do this when:

    • You need to add new apt packages.

    • You need to update docker (see above).

    • You need to run some expensive one-off installation.

    • You need to update ssl/gpg keys

      Note that you need to manually update the sourceAMI field in the app.json file after you create a new base AMI.

      Also note to generate this base AMI, access to the ssl and gpg keys that the work needs is necessary.

      Example:

      ./deploy/bin/build base
      
  2. Building the app AMI. Do this when:

    • You want to test new code/features.

    • You need to update statsd/configs (not packages).

      Note: That just because you deploy an AMI does not mean anyone is using it.. Usually you need to also update a provisioner workerType with the new AMI id.

      Example:

      ./deploy/bin/build app
      

Deployment Implementation

Everything related to the deployment of the worker is in the deploy folder which has a number of other important sub folders.

  • deploy/packer : The packer folder contains a list (app/base) of ami(s) which need to be created... Typically you only need to build the "app" ami which is built on a pre-existing base ami (see sourceAMI in app.json).

  • deploy/variables.js : contains the list of variables for the deployment and possible defaults

  • deploy/template : This folder is a mirror of what will be deployed on the server but with mustache like variables (see variables.js for the list of all possible variables) if you need to add a script/config/etc... Add it here in one of the sub folders.

  • deploy/deploy.json : A generated file (created by running deploy/bin/build ) or running make -C deploy this file contains all the variables needed to deploy the application. The script deploy/bin/import-docker-worker-secrets generates the file from password store.

  • deploy/target : Contains the final files to be uploaded when creating the AMI all template values have been subsituted... It is useful to check this by running make -C deploy prior to building the full ami.

  • deploy/bin/build : The script responsible for invoking packer with the correct arguments and creating the artifacts which need to be uploaded to the AMI)

  • deploy/bin/update-worker-types.js : after running deploy/bin/build app, run this script to update aws-provisioner with the new AMIs. It creates a backup file with current worker-types configuration and kills the worker-types running instances. It requires node 8.5.0+.

  • deploy/bin/rollback-worker-types.js : Given the backup file, this scripts rolls back the worker type configuration. It requires node 8.5.0+.

  • deploy/bin/github-release.js : It creates a Github release of the current branch. Do not use this script directly, use the release.sh script, which does some safe checks before releasing.

Block-Device Mapping

The AMI built with packer will mount all available instances storage under /mnt and use this for storing docker images and containers. In order for this to work you must specify a block device mapping that maps ephemeral[0-9] to /dev/sd[b-z].

It should be noted that they'll appear in the virtual machine as /dev/xvd[b-z], as this is how Xen storage devices are named under newer kernels. However, the format and mount script will mount them all as a single partition on /mnt using LVM.

An example block device mapping looks as follows:

  {
  "BlockDeviceMappings": [
      {
        "DeviceName": "/dev/sdb",
        "VirtualName": "ephemeral0"
      },
      {
        "DeviceName": "/dev/sdc",
        "VirtualName": "ephemeral1"
      }
    ]
  }

Updating Schema

Schema changes are not deployed automatically so if the schema has been changed, the run the upload-schema.js script to update.

Before running the upload schema script, ensure that AWS credentials are loaded into your environment. See Configuring AWS with Node

Run the upload-schema.js script to update the schema:

babel-node bin/upload-schema.js

Post-Deployment Verification

After creating a new AMI, operation can be verified by updating a test worker type in the AWS Provisioner and submitting tasks to it. Ensure that the tasks were claimed and completed with the successful outcome. Also add in features/capabilities to the tasks based on code changes made in this release.

Further verification should be done if underlying packages, such as docker, change. Stress tests should be used (submit a graph with a 1000 tasks) to ensure that all tasks have the expected outcome and complete in an expected amount of time.

Errors from docker-worker are reported into papertrail and should be monitored during roll out of new AMIs. Searching for the AMI Id along with ("task resolved" OR "claim task") should give a rough idea if work is being done using these new AMIs.