Deploy a Secure Research Environment with Apache Guacamole#
These instructions will walk you through deploying a Secure Research Environment (SRE) that uses an existing Safe Haven Management (SHM) environment.
Explanation of symbols used in this guide#
Powershell command
This indicates a
Powershellcommand which you will need to run locally on your machineEnsure you have checked out (or downloaded) the appropriate tag of the Safe Haven repository from alan-turing-institute/data-safe-haven.
Open a
Powershellterminal and navigate to the indicated directory of your locally checked-out version of the Safe Haven repositoryEnsure that you are logged into Azure by running the
Connect-AzAccountcommandTip
If your account is a guest in additional Azure tenants, you may need to add the
-Tenant <Tenant ID>flag, where<Tenant ID>is the ID of the Azure tenant you want to deploy into.This command will give you a URL and a short alphanumeric code.
Go to URL in a web browser, enter the code and log in to your account on Azure.
Tip
If you have several Azure accounts, make sure you use one that has permissions to make changes to the subscription you are using
Remote command
This indicates a command which you will need to run remotely on an Azure virtual machine (VM) using
Microsoft Remote DesktopOpen
Microsoft Remote Desktopand clickAdd Desktop/Add PCEnter the private IP address of the VM that you need to connect to in the
PC namefield (this can be found by looking in the Azure portal)Enter the name of the VM (for example
DC1-SHM-PROJECT) in theFriendly namefieldClick
AddEnsure you are connected to the SHM VPN that you have set up
Double click on the desktop that appears under
Saved DesktopsorPCs.Use the
usernameandpasswordspecified by the appropriate section of the guide
Tip
If you see a warning dialog that the certificate cannot be verified as root, accept this and continue.
Azure Portal operation
This indicates an operation which needs to be carried out in the
Azure Portalusing a web browser on your local machine.You will need to login to the portal using an account with privileges to make the necessary changes to the resources you are altering
Microsoft Entra ID operation
This indicates an operation which needs to be carried out in the
Azure Portalusing a web browser on your local machine.You will need to login to the portal using an account with administrative privileges on the
Microsoft Entra IDthat you are altering.Note that this might be different from the account which is able to create/alter resources in the Azure subscription where you are building the Safe Haven.
OS-dependent steps
The following icons indicate steps that depend on the OS you are using to deploy the SHM
MacOS
Windows
Linux
1. π± Prerequisites#
An
SHM environmentthat has already been deployed in AzureFollow the Safe Haven Management (SHM) deployment guide if you have not done so already.
An Azure subscription with sufficient credits to build the environment in: we recommend around $1,000 as a reasonable starting point.
This can be the same or different from the one where the SHM is deployed
Tip
Ensure that the Owner of the subscription is an
Azure Security groupthat contains all administrators and no-one else.We recommend using separate
Microsoft Entra IDsfor users and administrators
Access to a global administrator account on the SHM Microsoft Entra ID
π° Software#
PowerShellwith support for AzureWe recommend installing the latest stable release of Powershell. We have most recently tested deployment using version
7.4.1.Install the Azure PowerShell Module using
Install-Module -Name Az -RequiredVersion 5.0.0 -Repository PSGallery
Microsoft Remote DesktopOn macOS this can be installed from the Apple store
OpenSSLInstall using your package manager of choice
Hint
If you run:
PS> Start-Transcript -Path <a log file>
before you start your deployment and
PS> Stop-Transcript
afterwards, you will automatically get a full log of the Powershell commands you have run.
π VPN connection to the SHM VNet#
For some operations, you will need to log on to some of the VMs that you deploy and make manual changes. This is done using the VPN which should have been deployed when setting up the SHM environment.
π SRE domain name#
You will need access to a public routable domain name for the SRE and its name servers.
This can be a subdomain of the Safe Haven Management domain, e.g, sandbox.project.turingsafehaven.ac.uk, or a top-level domain (eg. mydatasafehaven.co.uk ).
β« Deploying multiple SREs in parallel#
Important
You can only deploy to one SRE at a time from a given computer as the Az Powershell module can only work within one Azure subscription at a time.
If you need to deploy multiple SREs in parallel you will need to use multiple computers. These can be different physical computers or you can provision dedicated deployment VMs - this is beyond the scope of this guide.
2. π Secure Research Environment configuration#
The full configuration details for a new SRE are generated by defining a few βcoreβ properties for the new SRE and the management environment in which it will be deployed.
Secure research environment ID#
Choose a short ID <SRE ID> to identify the secure research environment (e.g. sandbox).
This can have a maximum of seven alphanumeric characters.
π SHM configuration properties#
The core properties for the relevant pre-existing Safe Haven Management (SHM) environment must be defined in a JSON file named shm_<SHM ID>_core_config.json in the environment_configs folder.
Please read the instructions to find out what to put in this file.
π SRE configuration properties#
The core properties for the secure research environment (SRE) must be defined in a JSON file named sre_<SHM ID><SRE ID>_core_config.json in the environment_configs folder.
The following core SRE properties are required - look in the environment_configs folder to see some examples.
{
"sreId": "The <SRE ID> that you decided on above (eg. 'sandbox').",
"tier": "The data classification tier for the SRE. This controls the outbound network restrictions on the SRE and which mirror set the SRE is peered with",
"shmId": "The <SHM ID> that you decided on above (eg. 'testa').",
"subscriptionName": "Azure subscription that the SRE will be deployed into.",
"ipPrefix": "The three octet IP address prefix for the Class A range used by the SRE. See suggestion below on how to set this",
"inboundAccessFrom": "A comma-separated string of IP ranges (addresses or CIDR ranges) from which access to the RDS webclient is permitted. See tip default below for suggestion on how to set this.",
"outboundInternetAccess": "Whether to allow outbound internet access from inside the remote desktop environment. Either ('Yes', 'Allow', 'Permit'), ('No', 'Deny', 'Forbid') or 'default' (for Tier 0 and 1 'Allow' otherwise 'Deny')",
"computeVmImage": {
"type": "The name of the SRD image (most commonly 'Ubuntu')",
"version": "The version of the SRD image (e.g. 0.1.2019082900)"
},
"remoteDesktopProvider": "[Deprecated] Only 'ApacheGuacamole' is supported. If this parameter is not supplied, it will default to 'ApacheGuacamole'",
"azureAdminGroupName": "[Optional] Azure Security Group that admins of this SRE will belong to. If not specified then the same one as the SHM will be used.",
"dataAdminIpAddresses": "A list of one or more IP addresses which admins will be using to transfer sensitive data to/from the secure Azure storage area (if not specified then Turing IP addresses will be used).",
"databases": "[Optional] A list of zero or more database flavours from the following list ('MSSQL', 'PostgreSQL'). For example ['MSSQL', 'PostgreSQL'] would deploy both an MS-SQL and a PostgreSQL database.",
"deploymentIpAddresses": "[Optional] A list of one or more IP addresses which admins will be using when deploying the SRE (if not specified then deployment commands from any IP address will be permitted).",
"domain": "[Optional] The fully qualified domain name for the SRE. If not specified then <SRE ID>.<SHM domain> will be used.",
"overrides": "[Optional, Advanced] Do not use this unless you know what you're doing! If you want to override any of the default settings, you can do so by creating the same JSON structure that would be found in the final config file and nesting it under this entry. For example, to change the name of the Key Vault secret containing the MSSQL admin password, you could use something like: 'sre: { databases: { dbmssql: { adminPasswordSecretName: my-password-name } } }'"
}
Tip
We recommend the following for the inboundAccessFrom setting
Tier 0/1 SREs: this can be set to
Internet, allowing access from anywhere.Tier 2 SREs: this should correspond to the IP addresses of organisational networks (including guest networks) for all approved partner organisations (i.e. specific networks managed by the organisation, such as
EduRoam,Turing Guest,Turing Secure)Tier 3 SREs: this should correspond to the IP addresses of restricted networks for all approved partner organisations. These should only permit connections from within medium security access controlled physical spaces and from managed devices (e.g.
Turing Secure).
Important
The ipPrefix must be unique for each SRE attached to the same SHM. Each SRE needs a range of 2048 IP address (a /21 range in CIDR notation) in a private IP range.
The config itself expects the first three digits denoting the range (e.g. "ipPrefix": "10.11.0.0" rather than "ipPrefix": "10.11.0.0/21")
It is important that the range chosen doesnβt overlap with the SHM (by default 10.0.0.0 - 10.0.7.255), the package repositories (by default 10.10.2.0-10.10.3.255) or any other SRE.
You may find this tool helpful to convert between IP address ranges and CIDRs.
Alan Turing Institute default
We assign consecutive /21 ranges starting from 10.11.0.0/21 (ie. the first three SREs will use 10.11.0.0/21, 10.11.8.0/21 and 10.11.16.0/21).
(Optional) Verify code version#
If you have cloned/forked the code from our GitHub repository, you can confirm which version of the Data Safe Haven you are currently using by running the following commands:
PS> git tag --list | Select-String $(git describe --tags)
This will check the tag you are using against the list of known tags and print it out. You can include this confirmation in any record you keep of your deployment.
(Optional) π View full SRE configuration#
A full configuration, which will be used in subsequent steps, will be automatically generated from your core configuration. Should you wish to, you can print the full SRE config by running the following Powershell command:
at π
./deployment
PS> ./ShowConfigFile.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SRE
3. π» Deploy SRE#
at π
./deployment/secure_research_environment/setup
PS> ./Deploy_SRE.ps1 -shmId <SHM ID> -sreId <SRE ID> -VMs <VM sizes>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SREwhere
<VM sizes>is a list of Azure VM sizes that you want to create. For example'Standard_D2s_v3', 'default', 'Standard_NC6s_v3'. If you are unsure of the appropriate VM sizes, run the script with a single'default'. The default VM size isStandard_D2s_v3.VMs can be resized after deployment. See how to do so in the System Manager instructions.
You will be prompted for credentials for:
a user with admin rights over the Azure subscriptions you plan to deploy into
a user with Global Administrator privileges over the SHM Microsoft Entra ID
This will perform the following actions, which can be run individually if desired:
Remove data from previous deployments
Caution
If you are redeploying an SRE in the same subscription and did not use the ./SRE_Teardown.ps1 script to clean up the previous deployment, then there may be residual SRE data in the SHM.
This script will remove any such data.
Note
If the subscription is not empty, confirm that it is not being used before deleting any resources in it.
at π
./deployment/secure_research_environment/setup
PS> ./Remove_SRE_Data_From_SHM.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SRE
Register SRE with the SHM
at π
./deployment/secure_research_environment/setup
PS> ./Setup_SRE_Key_Vault_And_Users.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SRE
This step will register service accounts with the SHM and also create a Key Vault in the SRE subscription (at Resource Groups > RG_SHM_<SHM ID>_SRE_<SRE ID>_SECRETS > kv-<SHM ID>-sre-<SRE ID>).
Create SRE DNS Zone
at π
./deployment/secure_research_environment/setup
PS> ./Setup_SRE_DNS_Zone.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SRE
Error
If you see a message You need to add the following NS records to the parent DNS system for... you will need to manually add the specified NS records to the parentβs DNS system, as follows:
Manual DNS configuration instructions
To find the required values for the NS records on the portal, click
All resourcesin the far left panel, search for βDNS Zoneβ and locate the DNS Zone with SREβs domain. The NS record will list 4 Azure name servers.
Duplicate these records to the parent DNS system as follows:
If the parent domain has an Azure DNS Zone, create an NS record set in this zone.
The name should be set to the subdomain (e.g.
sandbox) or@if using a custom domain, and the values duplicated from above.For example, for a new subdomain
sandbox.testa.dsgroupdev.co.uk, duplicate the NS records from the Azure DNS Zonesandbox.testa.dsgroupdev.co.ukto the Azure DNS Zone fortesta.dsgroupdev.co.uk, by creating a record set with namesandbox.
If the parent domain is outside of Azure, create NS records in the registrar for the new domain with the same value as the NS records in the new Azure DNS Zone for the domain.
Deploy the virtual network
at π
./deployment/secure_research_environment/setup
PS> ./Setup_SRE_Networking.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SRE
Note
The VNet peerings may take a few minutes to provision after the script completes.
Deploy storage accounts
at π
./deployment/secure_research_environment/setup
PS> ./Setup_SRE_Storage_Accounts.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SRE
This script will create a storage account in the RG_SHM_<SHM ID>_PERSISTENT_DATA resource group, a corresponding private end point in RG_SRE_<SRE ID>_NETWORKING and will configure the DNS zone of the storage account to the right IP address.
Deploy Apache Guacamole remote desktop
at π
./deployment/secure_research_environment/setup
PS> ./Setup_SRE_Guacamole_Servers.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHM.where
<SRE ID>is the secure research environment ID for this SRE.
Update SSL certificate
at π
./deployment/secure_research_environment/setup
PS> ./Update_SRE_SSL_Certificate.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SREwhere
<email>is an email address that you want to be notified when certificates are close to expiry
Tip
./Update_SRE_RDS_SSL_Certificate.ps1 should be run again whenever you want to update the certificate for this SRE.
Caution
Let's Encrypt will only issue 5 certificates per week for a particular host (e.g. rdg-sre-sandbox.project.turingsafehaven.ac.uk).
To reduce the number of calls to Let's Encrypt, the signed certificates are stored in the Key Vault for easy redeployment.
For production environments this should usually not be an issue.
Important
If you find yourself frequently redeploying a test environment and hit the Let's Encrypt certificate limit, you can can use:
> ./Update_SRE_RDS_SSL_Certificate.ps1 -dryRun $true
to use the Let's Encrypt staging server, which will issue certificates more frequently.
These certificates will not be trusted by your browser, and so should not be used in production.
Deploy web applications (CodiMD and GitLab)
at π
./deployment/secure_research_environment/setup
PS> ./Setup_SRE_WebApp_Servers.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SRE
Deploy databases
at π
./deployment/secure_research_environment/setup
PS> ./Setup_SRE_Databases.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SRE
This will deploy any databases that you specified in the core config file. The time taken will depend on which (if any) databases you chose.
Important
The deployment of an
MS-SQLdatabase will take around 60 minutes to complete.The deployment of a
PostgreSQLdatabase will take around 10 minutes to complete.
Deploy Secure Research Desktops (SRDs)
The -VmSizes parameter that you provided to the Deploy_SRE.ps1 script determines how many SRDs are created and how large each one will be.
Note
The following script will be run once for each <VM size> that you specified.
If you specify the same size more than once, you will create multiple SRDs of that size.
at π
./deployment/secure_research_environment/setup
PS> ./Add_Single_SRD.ps1 -shmId <SHM ID> -sreId <SRE ID> -ipLastOctet <IP last octet> [-vmSize <VM size>]
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SREwhere
<IP last octet>is last octet of the IP address[optional] where
<VM size>is the Azure VM size for this SRD
This will deploy a new SRD into the SRE environment.
Tip
If this SRE needs additional software or settings that are not in your default VM image, you can create a custom cloud init file on your deployment machine.
By default, SRD deployments will use the
cloud-init-srd.mustache.yamlconfiguration file in thedeployment/secure_research_environment/cloud_init/folder. This does all the necessary steps to configure the VM to work with LDAP.If you require additional steps to be taken at deploy time while the VM still has access to the internet (e.g. to install some additional project-specific software), copy the default cloud init file to a file named
cloud-init-srd-shm-<SHM ID>-sre-<SRE ID>.mustache.yamlin the same folder and add any additional required steps in theSRE-SPECIFIC COMMANDSblock marked with comments.
Alan Turing Institute default
CPU-based VMs are deployed with the next unused last octet in the range
160to179GPU-based VMs are deployed with the next unused last octet in the range
180and199
Apply network configuration
at π
./deployment/secure_research_environment/setup
PS> ./Apply_SRE_Network_Configuration.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SRE
This will apply the locked-down network settings which will restrict access into/out of this SRE.
Configure firewall
at π
./deployment/secure_research_environment/setup
PS> ./Setup_SRE_Firewall.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SRE
Configure monitoring
at π
./deployment/secure_research_environment/setup
PS> ./Setup_SRE_Monitoring.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SRE
Error
As installing the logging agent can take several minutes, it is possible that some of the commands run in this script might time out. The script should automatically retry any that fail, if you see any failure messages, please re-run:
PS> ./Setup_SRE_Monitoring.ps1 -shmId $shmId -sreId $sreId
this will attempt to install the extensions again, skipping any VMs that already have the extensions installed.
Enable backup
at π
./deployment/secure_research_environment/setup
PS> ./Setup_SRE_Backup.ps1 -shmId <SHM ID> -sreId <SRE ID>
where
<SHM ID>is the management environment ID for this SHMwhere
<SRE ID>is the secure research environment ID for this SRE
This will enable regular backups for the persistent data storage accounts, both ingress and egress data.
4. π¬ Test deployed SRE#
π΄ Verify non-privileged user account is set up#
These steps ensure that you have created a non-privileged user account that you can use for testing. You must ensure that you have assigned a licence to this user in the Microsoft Entra ID so that MFA will work correctly.
You should have already set up a non-privileged user account upon setting up the SHM, when validating the active directory synchronisation, but you may wish to set up another or verify that you have set one up already:
Set up a non-privileged user account
Log into the SHM primary domain controller (
DC1-SHM-<SHM ID>) VM using the connection details that you previously used to log into this VM.Follow the user creation instructions from the SHM deployment guide (everything under the
Validate Active Directory synchronisationheader). In brief these involve:adding your details (ie. your first name, last name, phone number etc.) to a user details CSV file.
running
C:\Installation\CreateUsers.ps1 <path_to_user_details_file>in a Powershell command window with elevated privileges.
This will create a user in the local Active Directory on the SHM domain controller and start the process of synchronisation to the Azure Active Directory, which will take around 5 minutes.
Ensure that your non-privileged user account is in the correct Security Group
Log into the SHM primary domain controller (
DC1-SHM-<SHM ID>) VM using the connection details that you previously used to log into this VM.In Server Manager click
Tools > Active Directory Users and ComputersIn
Active Directory Users and Computers, expand the domain in the left hand panel clickSafe Haven Security GroupsRight click the
SG <SRE ID> Research Userssecurity group and selectPropertiesClick on the
Memberstab.If your user is not already listed here you must add them to the group
Click the
AddbuttonEnter the start of your username and click
Check namesSelect your username and click
OkClick
Okagain to exit theAdd usersdialogue
Synchronise with Microsoft Entra ID by running following the
Powershellcommand on the SHM primary domain controller
PS> C:\Installation\Run_ADSync.ps1
π Ensure that your non-privileged user account has MFA enabled#
Switch to your custom Microsoft Entra ID in the Azure portal and make the following checks:
From the Azure portal, navigate to the Microsoft Entra ID you have created.
The
Usage Locationmust be set in Microsoft Entra ID (should be automatically synchronised from the local Active Directory if it was correctly set there)Navigate to
Microsoft Entra ID > Manage / Users > (user account), and ensure thatSettings > Usage Locationis set.
A licence must be assigned to the user.
Navigate to
Microsoft Entra ID > Manage / Users > (user account) > Licensesand verify that a license is assigned and the appropriate MFA service enabled.
To complete the account setup, follow the instructions for password and MFA setup present in the user guide.
π Test the Apache Guacamole remote desktop#
Launch a local web browser on your deployment machine and go to
https://<SRE ID>.<safe haven domain>and log in with the user name and password you set up for the non-privileged user account.For example for
<safe haven domain> = project.turingsafehaven.ac.ukand<SRE ID> = sandboxthis would behttps://sandbox.project.turingsafehaven.ac.uk/
You should see a screen like the following. If you do not, follow the troubleshooting instructions below.
At this point you should double click on the π»
Ubuntu0link underAll Connectionswhich should bring you to the secure remote desktop (SRD) login screenYou will need the short-form of the user name (ie. without the
@<safe haven domain>part) and the same password as beforeThis should bring you to the SRD that will look like the following
Important
Ensure that you are connecting from one of the permitted IP ranges specified in the inboundAccessFrom section of the SRE config file.
For example, if you have authorised a corporate VPN, check that you have correctly configured you client to connect to it.
Error
If you see an error like the following when attempting to log in, it is likely that the Microsoft Entra application is not registered as an ID token provider.
Register Microsoft Entra application

From the Azure portal, navigate to the Microsoft Entra ID you have created.
Navigate to
Microsoft Entra ID > App registrations, and select the application calledGuacamole SRE <SRE ID>.Click on
Authenticationon the left-hand sidebarEnsure that the
ID tokenscheckbox is ticked and click on theSaveicon if you had to make any changes
βοΈ Test CodiMD and GitLab servers#
Connect to the remote desktop using the instructions above
Test
CodiMDby clicking on theCodiMDdesktop icon.This should open a web browser inside the remote desktop
Log in with the short-form
usernameof a user in theSG <SRE ID> Research Userssecurity group.
Test
GitLabby clicking on theGitLabdesktop icon.This should open a web browser inside the remote desktop
Log in with the short-form
usernameof a user in theSG <SRE ID> Research Userssecurity group.
Error
Should there be any issues using the web apps (e.g. unable to log in, or log in page not appearing) you can inspect the build log and access the console for the relevant VMs following the guide for System Managers
π₯ Run smoke tests on SRD#
These tests should be run after the network lock down and peering the SRE and package mirror VNets. They are automatically uploaded to the SRD during the deployment step.
Use the remote desktop interface at
https://<SRE ID>.<safe haven domain>to log in to the SRD (SRE-<SRE ID>-<IP last octet>-<version number>) that you have deployed using the scripts aboveOpen a terminal session
Enter the test directory using
cd /opt/testsRun
bats run_all_tests.bats:if any of the tests fail, check the
README.mdin this folder for help in diagnosing the issues
Copy
tests/test_jupyter.ipynbto your home directoryactivate each of the available Python versions in turn
run
jupyter notebookin each case and check that you can run the notebook and that all versions and paths match throughout. See Available Python and R versions