Build an SRE compute image#

These instructions will walk you through creating a new VM image for use in the secure research environment.

Explanation of symbols used in this guide#

Powershell command

Powershell: estimate of time needed

  • This indicates a Powershell command which you will need to run locally on your machine

  • Ensure you have checked out (or downloaded) the appropriate tag of the Safe Haven repository from alan-turing-institute/data-safe-haven.

  • Open a Powershell terminal and navigate to the indicated directory of your locally checked-out version of the Safe Haven repository

  • Ensure that you are logged into Azure by running the Connect-AzAccount command

    Tip

    If your account is a guest in additional Azure tenants, you may need to add the -Tenant <Tenant ID> flag, where <Tenant ID> is the ID of the Azure tenant you want to deploy into.

  • This command will give you a URL and a short alphanumeric code.

  • Go to URL in a web browser, enter the code and log in to your account on Azure.

    Tip

    If you have several Azure accounts, make sure you use one that has permissions to make changes to the subscription you are using

Remote command

Remote: estimate of time needed

  • This indicates a command which you will need to run remotely on an Azure virtual machine (VM) using Microsoft Remote Desktop

  • Open Microsoft Remote Desktop and click Add Desktop / Add PC

  • Enter the private IP address of the VM that you need to connect to in the PC name field (this can be found by looking in the Azure portal)

  • Enter the name of the VM (for example DC1-SHM-PROJECT) in the Friendly name field

  • Click Add

  • Ensure you are connected to the SHM VPN that you have set up

  • Double click on the desktop that appears under Saved Desktops or PCs.

  • Use the username and password specified by the appropriate section of the guide

Tip

If you see a warning dialog that the certificate cannot be verified as root, accept this and continue.

Azure Portal operation

Portal: estimate of time needed

  • This indicates an operation which needs to be carried out in the Azure Portal using a web browser on your local machine.

  • You will need to login to the portal using an account with privileges to make the necessary changes to the resources you are altering

Microsoft Entra ID operation

Microsoft Entra ID: estimate of time needed

  • This indicates an operation which needs to be carried out in the Azure Portal using a web browser on your local machine.

  • You will need to login to the portal using an account with administrative privileges on the Microsoft Entra ID that you are altering.

  • Note that this might be different from the account which is able to create/alter resources in the Azure subscription where you are building the Safe Haven.

OS-dependent steps

The following icons indicate steps that depend on the OS you are using to deploy the SHM

  • macOS MacOS

  • Windows Windows

  • Linux Linux

1. 🌱 Prerequisites#

Hint

If you run:

PS> Start-Transcript -Path <a log file>

before you start your deployment and

PS> Stop-Transcript

afterwards, you will automatically get a full log of the Powershell commands you have run.

(Optional) Verify code version#

If you have cloned/forked the code from our GitHub repository, you can confirm which version of the Data Safe Haven you are currently using by running the following commands:

Powershell: a few seconds

PS> git tag --list | Select-String $(git describe --tags)

This will check the tag you are using against the list of known tags and print it out. You can include this confirmation in any record you keep of your deployment.

2. 🎁 (Optional) Customise the build configuration#

Provisioning a VM with all the Safe Haven software is done using cloud-init. This takes a basic Ubuntu image and installs and configures all the necessary software packages. In general, this image should cover most use cases, but it’s possible that you may want to customise it for your particular circumstances, for example if you want to add a new package or to update the version of an existing package.

Adding a new apt package#

  • Add the name of the package to deployment/secure_research_desktop/packages/packages-apt.list

  • If this package adds a new executable that you would like to be available to the end user, you should also add a check for this to the end of deployment/secure_research_desktop/cloud_init/cloud-init-buildimage-ubuntu-<version>.mustache.yaml

Hint

For example, to check for Azure Data Studio, the following line was added:

if [ "$(which azuredatastudio)" ]; then echo "\n\n*azuredatastudio*\n\n$(which azuredatastudio)"; else echo "ERROR azuredatastudio not found!"; exit 1; fi

Adding a new Python package#

  • Add the name of the package as it appears on PyPI to the package list:

    • deployment/secure_research_desktop/packages/packages-python.yaml

    • If there are any restrictions on acceptable versions for this package (e.g. a minimum or exact version) then make sure to specify this

  • You should also add this package to the allow list used by Tier 3 package mirrors in environment_configs/package_lists/allowlist-core-python-pypi-tier3.list

Adding a new R package#

  • Add the name of the package as it appears on CRAN or Bioconductor to the appropriate package list:

    • deployment/secure_research_desktop/packages/packages-r-bioconductor.list

    • deployment/secure_research_desktop/packages/packages-r-cran.list

  • If this R package is available as a pre-compiled apt binary (eg. abind is available as r-cran-abind) then also add it to deployment/secure_research_desktop/packages/packages-apt.list.

  • You should also add this package to the allow list used by Tier 3 package mirrors in environment_configs/package_lists/allowlist-core-r-cran-tier3.list

Adding packages to the package allowlist#

  • When you add a new package to either the PyPI or CRAN allowlist you should also determine all of its dependencies (and their dependencies, recursively)

  • Once you have the list of packages you should add them to:

    • PyPI: environment_configs/package_lists/allowlist-full-python-pypi-tier3.list

    • CRAN: environment_configs/package_lists/allowlist-full-r-cran-tier3.list

Changing the version of a package#

If you want to update the version of one of the packages we install from a .deb file (eg. RStudio), you will need to edit deployment/secure_research_desktop/cloud_init/cloud-init-buildimage-ubuntu-<version>.mustache.yaml

  • Find the appropriate /installation/<package name>.debinfo section under the write_files: key

  • Update the version number and the sha256 hash for the file

  • Check that the file naming structure still matches the format described in this .debinfo file

3. πŸ‘· Build a release candidate#

In order to provision a candidate VM you will need to do the following:

Powershell: two to three hours at πŸ“ ./deployment/secure_research_desktop/setup

PS> ./Provision_Compute_VM.ps1 -shmId <SHM ID>

Note

  • Although the ./Provision_Compute_VM.ps1 script will finish running in a few minutes, the build itself will take several hours.

  • We recommend monitoring the build by accessing the machine using ssh (the ssh info should be printed at the end of the Provision_Compute_VM.ps1 script) and either reading through the full build log at /var/log/cloud-init-output.log or running the summary script using /opt/monitoring/analyse_build.py.

  • NB. You will need to connect from an approved administrator IP address

  • NB. the VM will automatically shutdown at the end of the cloud-init process - if you want to analyse the build after this point, you will need to turn it back on in the Azure portal.

Error

  • If you are unable to access the VM over ssh please check whether you are trying to connect from one of the approved IP addresses that you defined under vmImages > buildIpAddresses in the SHM config file.

  • You can check which IP addresses are currently allowed by looking at the AllowBuildAdminSSH inbound connection rule in the RG_VMIMAGES_NETWORKING > NSG_VMIMAGES_BUILD_CANDIDATES network security group in the subscription where you are building the candidate VM

4. πŸ“· Convert candidate VM to an image#

Once you are happy with a particular candidate, you can convert it into an image as follows:

Powershell: ten minutes at πŸ“ ./deployment/secure_research_desktop/setup

PS> ./Convert_VM_To_Image.ps1 -shmId <SHM ID> -vmName <VM name>
  • where <SHM ID> is the management environment ID for this SRE

  • where <VM name> is the name of the virtual machine created during the provisioning step

This will build a new image in RG_VMIMAGES_STORAGE and delete the VM plus associated build artifacts (hard disk, network card and public IP address)

Note

The first step of this script will run the remote build analysis script. Please check that everything has built correctly before proceeding.