Build an SRE compute image#
These instructions will walk you through creating a new VM image for use in the secure research environment.
Explanation of symbols used in this guide#
Powershell command
This indicates a
Powershellcommand which you will need to run locally on your machineEnsure you have checked out (or downloaded) the appropriate tag of the Safe Haven repository from alan-turing-institute/data-safe-haven.
Open a
Powershellterminal and navigate to the indicated directory of your locally checked-out version of the Safe Haven repositoryEnsure that you are logged into Azure by running the
Connect-AzAccountcommandTip
If your account is a guest in additional Azure tenants, you may need to add the
-Tenant <Tenant ID>flag, where<Tenant ID>is the ID of the Azure tenant you want to deploy into.This command will give you a URL and a short alphanumeric code.
Go to URL in a web browser, enter the code and log in to your account on Azure.
Tip
If you have several Azure accounts, make sure you use one that has permissions to make changes to the subscription you are using
Remote command
This indicates a command which you will need to run remotely on an Azure virtual machine (VM) using
Microsoft Remote DesktopOpen
Microsoft Remote Desktopand clickAdd Desktop/Add PCEnter the private IP address of the VM that you need to connect to in the
PC namefield (this can be found by looking in the Azure portal)Enter the name of the VM (for example
DC1-SHM-PROJECT) in theFriendly namefieldClick
AddEnsure you are connected to the SHM VPN that you have set up
Double click on the desktop that appears under
Saved DesktopsorPCs.Use the
usernameandpasswordspecified by the appropriate section of the guide
Tip
If you see a warning dialog that the certificate cannot be verified as root, accept this and continue.
Azure Portal operation
This indicates an operation which needs to be carried out in the
Azure Portalusing a web browser on your local machine.You will need to login to the portal using an account with privileges to make the necessary changes to the resources you are altering
Microsoft Entra ID operation
This indicates an operation which needs to be carried out in the
Azure Portalusing a web browser on your local machine.You will need to login to the portal using an account with administrative privileges on the
Microsoft Entra IDthat you are altering.Note that this might be different from the account which is able to create/alter resources in the Azure subscription where you are building the Safe Haven.
OS-dependent steps
The following icons indicate steps that depend on the OS you are using to deploy the SHM
MacOS
Windows
Linux
1. π± Prerequisites#
An
Azuresubscription with sufficient credits to build the environment inPowershellforAzureInstall Powershell v6.0 or above
Install the Azure Powershell Module
SSHorOpenSSH(not tested on Windows)SHM configuration file
The core properties for the environment must be present in the
environment_configsfolder as described in the Safe Haven Management deployment instructions.
Hint
If you run:
PS> Start-Transcript -Path <a log file>
before you start your deployment and
PS> Stop-Transcript
afterwards, you will automatically get a full log of the Powershell commands you have run.
(Optional) Verify code version#
If you have cloned/forked the code from our GitHub repository, you can confirm which version of the Data Safe Haven you are currently using by running the following commands:
PS> git tag --list | Select-String $(git describe --tags)
This will check the tag you are using against the list of known tags and print it out. You can include this confirmation in any record you keep of your deployment.
2. π (Optional) Customise the build configuration#
Provisioning a VM with all the Safe Haven software is done using cloud-init. This takes a basic Ubuntu image and installs and configures all the necessary software packages. In general, this image should cover most use cases, but itβs possible that you may want to customise it for your particular circumstances, for example if you want to add a new package or to update the version of an existing package.
Adding a new apt package#
Add the name of the package to
deployment/secure_research_desktop/packages/packages-apt.listIf this package adds a new executable that you would like to be available to the end user, you should also add a check for this to the end of
deployment/secure_research_desktop/cloud_init/cloud-init-buildimage-ubuntu-<version>.mustache.yaml
Hint
For example, to check for Azure Data Studio, the following line was added:
if [ "$(which azuredatastudio)" ]; then echo "\n\n*azuredatastudio*\n\n$(which azuredatastudio)"; else echo "ERROR azuredatastudio not found!"; exit 1; fi
Adding a new Python package#
Add the name of the package as it appears on
PyPIto the package list:deployment/secure_research_desktop/packages/packages-python.yamlIf there are any restrictions on acceptable versions for this package (e.g. a minimum or exact version) then make sure to specify this
You should also add this package to the allow list used by Tier 3 package mirrors in
environment_configs/package_lists/allowlist-core-python-pypi-tier3.list
Adding a new R package#
Add the name of the package as it appears on
CRANorBioconductorto the appropriate package list:deployment/secure_research_desktop/packages/packages-r-bioconductor.listdeployment/secure_research_desktop/packages/packages-r-cran.list
If this
Rpackage is available as a pre-compiled apt binary (eg.abindis available asr-cran-abind) then also add it todeployment/secure_research_desktop/packages/packages-apt.list.You should also add this package to the allow list used by Tier 3 package mirrors in
environment_configs/package_lists/allowlist-core-r-cran-tier3.list
Adding packages to the package allowlist#
When you add a new package to either the
PyPIorCRANallowlist you should also determine all of its dependencies (and their dependencies, recursively)Once you have the list of packages you should add them to:
PyPI:
environment_configs/package_lists/allowlist-full-python-pypi-tier3.listCRAN:
environment_configs/package_lists/allowlist-full-r-cran-tier3.list
Changing the version of a package#
If you want to update the version of one of the packages we install from a .deb file (eg. RStudio), you will need to edit deployment/secure_research_desktop/cloud_init/cloud-init-buildimage-ubuntu-<version>.mustache.yaml
Find the appropriate
/installation/<package name>.debinfosection under thewrite_files:keyUpdate the version number and the
sha256hash for the fileCheck that the file naming structure still matches the format described in this
.debinfofile
3. π· Build a release candidate#
In order to provision a candidate VM you will need to do the following:
at π
./deployment/secure_research_desktop/setup
PS> ./Provision_Compute_VM.ps1 -shmId <SHM ID>
where
<SHM ID>is the management environment ID for this SRE
Note
Although the
./Provision_Compute_VM.ps1script will finish running in a few minutes, the build itself will take several hours.We recommend monitoring the build by accessing the machine using
ssh(the ssh info should be printed at the end of the Provision_Compute_VM.ps1 script) and either reading through the full build log at/var/log/cloud-init-output.logor running the summary script using/opt/monitoring/analyse_build.py.NB. You will need to connect from an approved administrator IP address
NB. the VM will automatically shutdown at the end of the cloud-init process - if you want to analyse the build after this point, you will need to turn it back on in the
Azureportal.
Error
If you are unable to access the VM over
sshplease check whether you are trying to connect from one of the approved IP addresses that you defined undervmImages > buildIpAddressesin the SHM config file.You can check which IP addresses are currently allowed by looking at the
AllowBuildAdminSSHinbound connection rule in theRG_VMIMAGES_NETWORKING > NSG_VMIMAGES_BUILD_CANDIDATESnetwork security group in the subscription where you are building the candidate VM
4. π· Convert candidate VM to an image#
Once you are happy with a particular candidate, you can convert it into an image as follows:
at π
./deployment/secure_research_desktop/setup
PS> ./Convert_VM_To_Image.ps1 -shmId <SHM ID> -vmName <VM name>
where
<SHM ID>is the management environment ID for this SREwhere
<VM name>is the name of the virtual machine created during the provisioning step
This will build a new image in RG_VMIMAGES_STORAGE and delete the VM plus associated build artifacts (hard disk, network card and public IP address)
Note
The first step of this script will run the remote build analysis script. Please check that everything has built correctly before proceeding.
5. π¨ Register image in the gallery#
Once you have created an image, it can be registered in the image gallery for future use using the Register_Image_In_Gallery.ps1 script.
at π
./deployment/secure_research_desktop/setup
PS> ./Register_Image_In_Gallery.ps1 -shmId <SHM ID> -imageName <Image name>
where
<SHM ID>is the management environment ID for this SREwhere
<Image Name>is the name of the VM image created during the conversion step
This will register the image in the shared gallery as a new version of the relevant SRD image. This command can take between 30 minutes and 1 hour to complete, as it has to replicate the VM across 3 different regions.