Managing Data Safe Haven deployments#

Important

This document assumes that you already have access to a Safe Haven Management (SHM) environment and one or more Secure Research Environments (SREs) that are linked to it.

Explanation of symbols used in this guide#

Powershell command

Powershell: estimate of time needed

  • This indicates a Powershell command which you will need to run locally on your machine

  • Ensure you have checked out (or downloaded) the appropriate tag of the Safe Haven repository from alan-turing-institute/data-safe-haven.

  • Open a Powershell terminal and navigate to the indicated directory of your locally checked-out version of the Safe Haven repository

  • Ensure that you are logged into Azure by running the Connect-AzAccount command

    Tip

    If your account is a guest in additional Azure tenants, you may need to add the -Tenant <Tenant ID> flag, where <Tenant ID> is the ID of the Azure tenant you want to deploy into.

  • This command will give you a URL and a short alphanumeric code.

  • Go to URL in a web browser, enter the code and log in to your account on Azure.

    Tip

    If you have several Azure accounts, make sure you use one that has permissions to make changes to the subscription you are using

Remote command

Remote: estimate of time needed

  • This indicates a command which you will need to run remotely on an Azure virtual machine (VM) using Microsoft Remote Desktop

  • Open Microsoft Remote Desktop and click Add Desktop / Add PC

  • Enter the private IP address of the VM that you need to connect to in the PC name field (this can be found by looking in the Azure portal)

  • Enter the name of the VM (for example DC1-SHM-PROJECT) in the Friendly name field

  • Click Add

  • Ensure you are connected to the SHM VPN that you have set up

  • Double click on the desktop that appears under Saved Desktops or PCs.

  • Use the username and password specified by the appropriate section of the guide

Tip

If you see a warning dialog that the certificate cannot be verified as root, accept this and continue.

Azure Portal operation

Portal: estimate of time needed

  • This indicates an operation which needs to be carried out in the Azure Portal using a web browser on your local machine.

  • You will need to login to the portal using an account with privileges to make the necessary changes to the resources you are altering

Azure Active Directory operation

Azure AD: estimate of time needed

  • This indicates an operation which needs to be carried out in the Azure Portal using a web browser on your local machine.

  • You will need to login to the portal using an account with administrative privileges on the Azure Active Directory that you are altering.

  • Note that this might be different from the account which is able to create/alter resources in the Azure subscription where you are building the Safe Haven.

OS-dependent steps

The following icons indicate steps that depend on the OS you are using to deploy the SHM

  • macOS MacOS

  • Windows Windows

  • Linux Linux

⏰ Renewing SRE Domain Certificates#

The remote desktop frontend of an SRE will need to periodically have its SSL certificate renewed so that it can be accessed via HTTPS. After each 90 day period that the SRE is live, re-run the script to update the certificate.

Powershell: five minutes at πŸ“ ./deployment/secure_research_environment/setup

PS> ./Update_SRE_SSL_Certificate.ps1 -shmId <SHM ID> -sreId <SRE ID>

↗️ Resize the Virtual Machine (VM) of a Secure Research Desktop (SRD)#

Sometimes during a project that uses a deployed SRE, researchers may find the available compute inadequate for their purposes and wish to increase the size of the SRD’s VM. The simplest way to resize a VM is via the Azure Portal, but it can also be done via script.

To resize via the Azure Portal:

  • Log into the Azure portal and locate the VM inside the Resource Group called RG_SHM_<shm id>_SRE_<sre id>_COMPUTE

  • Follow these instructions in the Azure portal

To resize via script:
  • Log into the Azure portal and locate the VM inside the Resource Group called RG_SHM_<shm id>_SRE_<sre id>_COMPUTE

  • Make a note of the last octet of the IP address

Powershell: ten minutes at πŸ“ ./deployment/secure_research_environment/setup

PS> ./Add_Single_SRD.ps1 -shmId <SHM ID> -sreId <SRE ID> -ipLastOctet <IP last octet> [-vmSize <VM size>] -Upgrade -Force
  • where <SHM ID> is the management environment ID for this SHM

  • where <SRE ID> is the secure research environment ID for this SRE

  • where <IP last octet> is last octet of the IP address (check what this is in the Azure Portal)

  • where <VM size> is the new Azure VM size

  • where <Upgrade> is required to ensure the old VM is replaced

  • where <Force> ensures that <Upgrade> works even when the VM is built with the same image

Tip

If the new VM size you want isn’t shown as available in the Azure Portal, there are several steps that can be taken.

Firstly, try stopping the VM and checking again whether the size you want is available, as this can reveal additional options that aren’t shown whilst the VM is running. For example, when resizing to an N-series VM in Azure, (see πŸ’½ Using GPUs in SRDs) we’ve found that NVIDIA options such as the NVv3-series are not always shown as available.

Next, you can try to request an increase in the vCPU quota for the VM family of the desired VM:

  • Navigate to the Azure Portal and on the subscription page, click Usage + quotas under Settings

  • Choose the family appropriate to the VM that you want to resize to, and select a region appropriate for the SRE

  • Click the pen icon and set the New Limit to at least the number of vCPUs required by the VM that you want, the click submit

  • After the request is accepted, resize the VM as above

  • In some cases, the quota increase may require a request to be submitted to Microsoft

βž• Add a new SRD#

The -VmSizes parameter provided when deploying the SRE (with the Deploy_SRE.ps1 script) determines how many SRDs are created and how large each one will be.

To deploy a new SRD into the SRE environment, follow the below instructions:

Powershell: ten minutes at πŸ“ ./deployment/secure_research_environment/setup

PS> ./Add_Single_SRD.ps1 -shmId <SHM ID> -sreId <SRE ID> -ipLastOctet <IP last octet> [-vmSize <VM size>]

πŸ’½ Using GPUs in SRDs#

When you ↗️ Resize the Virtual Machine (VM) of a Secure Research Desktop (SRD) or βž• Add a new SRD featuring a GPU (N-series in Azure), you’ll need to ensure it has an Nvidia GPU (as opposed to AMD or other). See the Azure docs for more information. This is because only Nvidia GPUs support the drivers and CUDA libraries installed on the SRD image.

To test that a GPU enabled VM is working as expected, log into the SRE and type nvidia-smi into the terminal.

πŸ‘‘ Performing operations that require superuser privileges#

If you need to perform any operations in the SRE that require root access, you will need to log into the compute VM via the Serial Console in the Azure Portal.

Console access to the SRE VMs, including those for each web app and the compute VM, can be achieved through the Azure portal. All VMs share the same <admin username>, but each has its own <admin password>, which will need to be retrieved from the SRE key vault before accessing the console.

  • From the Azure portal, navigate to the Resource Group RG_SHM_<SHM ID>_SRE_<SRE ID>_SECRETS

  • Click on the SRE keyvault kv-<SHM ID>_SRE_<SRE ID>

  • From the menu on the left, select Secrets from the Objects section.

  • All VMs share the same <admin username>, found in the sre-<SRE ID>-vm-admin-username secret.

  • Each VM has its own <admin password>, found in the sre-<SRE ID>-vm-admin-password-<VM> secret.

Once you have the <admin username> and <admin password>, you will be able to log in to the VM console as follows:

  • From the Azure portal, navigate to the correct resource group:

    • RG_SHM_<SHM ID>_SRE_<SRE ID>_WEBAPPS for the web applications

    • RG_SHM_<SHM ID>_SRE_<SRE ID>_COMPUTE for the compute VM

  • Click on the relevant VM

  • From the menu on the left, scroll down to the Help section and select Serial console

  • After a short time, you will be shown the console for the VM. You may need to press a key to be shown the login prompt.

  • Log in with the details you retrieved earlier to be given root access to the VM.

πŸ”₯ Remove a single SRE#

In order to tear down an SRE, use the following procedure:

On your deployment machine.

  • Ensure you have the same version of the Data Safe Haven repository as was used by your deployment team

  • Open a Powershell terminal and navigate to the deployment/administration directory within the Data Safe Haven repository

  • Ensure you are logged into Azure within Powershell using the command: Connect-AzAccount. This command will give you a URL and a short alphanumeric code. You will need to visit that URL in a web browser and enter the code

  • NB. If your account is a guest in additional Azure tenants, you may need to add the -Tenant <Tenant ID> flag, where <Tenant ID> is the ID of the Azure tenant you want to deploy into.

  • Run the following script:

    ./SRE_Teardown.ps1 -shmId <SHM ID> -sreId <SRE ID>
    
  • If you provide the optional -dryRun parameter then the names of all affected resources will be printed, but nothing will be deleted

πŸ”š Remove a complete Safe Haven#

πŸ’₯ Tear down any attached SREs#

On your deployment machine.

  • Ensure you have the same version of the Data Safe Haven repository as was used by your deployment team

  • Open a Powershell terminal and navigate to the deployment/administration directory within the Data Safe Haven repository

  • Ensure you are logged into Azure within Powershell using the command: Connect-AzAccount. This command will give you a URL and a short alphanumeric code. You will need to visit that URL in a web browser and enter the code

    Attention

    If your account is a guest in additional Azure tenants, you may need to add the -Tenant <Tenant ID> flag, where <Tenant ID> is the ID of the Azure tenant you want to deploy into.

  • For each SRE attached to the SHM, do the following:

    • Tear down the SRE by running:

    ./SRE_Teardown.ps1 -sreId <SRE ID>
    

    where the SRE ID is the one specified in the relevant config file

    Note

    If you provide the optional -dryRun parameter then the names of all affected resources will be printed, but nothing will be deleted

πŸ”“ Disconnect from the Azure Active Directory#

Connect to the SHM Domain Controller (DC1) via Remote Desktop Client over the SHM VPN connection

  • Log in as a domain user (ie. <admin username>@<SHM domain>) using the username and password obtained from the Azure portal

  • If you see a warning dialog that the certificate cannot be verified as root, accept this and continue

  • Open Powershell as an administrator

    • Navigate to C:\Installation

    • Run .\Disconnect_AD.ps1

    • You will need to provide login credentials (including MFA if set up) for <admin username>@<SHM domain>

Attention

Full disconnection of the Azure Active Directory can take up to 72 hours but is typically less. If you are planning to install a new SHM connected to the same Azure Active Directory you may find the AzureADConnect installation step requires you to wait for the previous disconnection to complete.

πŸ’£ Tear down the SHM#

On your deployment machine.

  • Ensure you have the same version of the Data Safe Haven repository as was used by your deployment team

  • Open a Powershell terminal and navigate to the deployment/administration directory within the Data Safe Haven repository

  • Ensure you are logged into Azure within Powershell using the command: Connect-AzAccount. This command will give you a URL and a short alphanumeric code. You will need to visit that URL in a web browser and enter the code

    Attention

    If your account is a guest in additional Azure tenants, you may need to add the -Tenant <Tenant ID> flag, where <Tenant ID> is the ID of the Azure tenant you want to deploy into.

  • Tear down the SHM by running:

    ./SHM_Teardown.ps1 -shmId <SHM ID>
    

    where <SHM ID> is the management environment ID specified in the configuration file.