Managing data ingress and egress#

Important

This document assumes that you already have access to a Safe Haven Management (SHM) environment and one or more Secure Research Environments (SREs) that are linked to it.

Data Ingress#

It is the data provider’s responsibility to upload the data required by the safe haven.

Important

Any data ingress must be signed off by the Dataset Provider Representative, Investigator and Referee (if applicable).

The following steps show how to generate a temporary write-only upload token that can be securely sent to the data provider, enabling them to upload the data:

  • In the Azure portal select Subscriptions then navigate to the subscription containing the relevant SHM

  • Search for the resource group: RG_SHM_<SHM ID>_PERSISTENT_DATA, then click through to the storage account called: <SHM ID><SRE ID>data<storage suffix> (where <storage suffix> is a random string)

  • Click Networking under Settings and paste the data provider’s IP address as one of those allowed under the Firewall header, then hit the save icon in the top left

  • From the Overview tab, click the link to Containers (in the middle of the page)

  • Click ingress

  • Click Shared access signature under Settings and do the following:

    • Under Permissions, check these boxes:

      • Write

      • List

    • Set a 24 hour time window in the Start and expiry date/time (or an appropriate length of time)

    • Leave everything else as default and click Generate SAS token and URL

    • Copy the Blob SAS URL

  • Send the Blob SAS URL to the data provider via secure email (for example, you could use the Egress secure email service)

  • The data provider should now be able to upload data by following these instructions

  • You can validate successful data ingress by logging into the SRD for the SRE and checking the /data volume, where you should be able to view the data that the data provider has uploaded

Software Ingress#

Software ingress is performed in a similar manner to data.

Important

Software ingress must go through the same approval process as is the case for data ingress, including sign-off from the Dataset Provider Representative, Investigator and Referee (if applicable).

  • Follow the same steps as for data ingress above to provide temporary write access, but set the time window for the SAS token to a shorter period (e.g. several hours)

  • Share the token with the Investigator, so they can install software within the time window

  • The Investigator can perform software ingress via Azure Storage Explorer (for instance as a zip file), by following the same instructions as the data provider

Data egress#

  • In the Azure portal select Subscriptions then navigate to the subscription containing the relevant SHM

  • Search for the resource group: RG_SHM_<SHM ID>_PERSISTENT_DATA, then click through to the storage account called: <SHM ID><SRE ID>data<storage suffix> (where <storage suffix> is a random string)

  • Click Networking under Settings to check the list of pre-approved IP addresses allowed under the Firewall header and check your own IP address to ensure you are connecting from one of these

  • Click Containers under Data storage

  • Click egress

  • Click Shared access signature under Settings and do the following:

    • Under Permissions, check these boxes:

      • Read

      • List

    • Set a time window in the Start and expiry date/time that gives you enough time to extract the data

    • Leave everything else as default click Generate SAS token and URL

      Read-only SAS token
    • Leave this portal window open and move to the next step

  • Open Azure Storage Explorer (download it if you don’t have it)

  • Click the socket image on the left hand side

    Azure Storage Explorer connection
  • On Select Resource, choose Blob container

  • On Select Connection Method, choose Shared access signature URL (SAS) and hit Next

    Connect with SAS token
  • On Enter Connection Info:

    • Set the Display name to “egress” (or choose an informative name)

    • Copy the Blob SAS URL from your Azure portal session into the Blob container SAS URL box and hit Next

  • On the Summary page, hit Connect

  • On the left hand side, the connection should show up under Local & Attached > Storage Accounts > (Attached Containers) > Blob Containers > ingress (SAS)

  • You should now be able to securely download the data from the Safe Haven’s output volume by highlighting the relevant file(s) and hitting the Download button

The output volume#

Once you have set up the egress connection in Azure Storage Explorer, you should be able to view data from the output volume, a read-write area intended for the extraction of results, such as figures for publication. On the SRD, this volume is /output and is shared between all SRDs in an SRE. For more info on shared SRE storage volumes, consult the Safe Haven User Guide.

🗄️ Backup#

🗃️ Restoring blobs#

Blob containers in backed up storage accounts are protected by operational backup. It is possible to restore the state of the blobs to an earlier point in time, up to twelve weeks in the past.

The blob containers covered by the protection for each SRE are the

  • ingress container (mounted at /data)

  • egress container (mounted at /output)

  • backup container (mounted at /backup)

To restore these containers to a previous point in time:

Important

Blobs are restored ‘in place’. The current state will be overwritten by the point which you restore to.

  • In the Azure portal select Subscriptions then navigate to the subscription containing the relevant SRE

  • Search for the resource group: RG_SHM_<SHM ID>_SRE_<SRE ID>_BACKUP, then click on the storage account called: bv-<shm id>-sre-<sre id>

  • Click Backup instances under Manage in the left-hand menu

  • Ensure that the Datasource type filter is set to Azure Blobs (Azure Storage)

    Selecting blob backup instances
  • Click on the storage-account backup instance

  • Select a point in the past to restore to and click Restore

    Selecting blob backup restore point
  • Click on Next: Restore Parameters

  • You can now choose whether to restore all, or a subset of the containers. In the example below the ‘egress’ and ‘backup’ containers are selected

  • Click on Validate

    Selecting blob containers to restore and validating
  • Click on Next: Review + restore

  • Click on Restore

💿 Restoring disks#

Backed up disks have incremental snapshots taken daily. These snapshots are stored in the backup resource group,RG_SHM_<SHM ID>_SRE_<SRE ID>_BACKUP.

The disks covered by the protection for each SRE are the

  • GitLab data disk

  • CodiMD data disk

  • CoCalc data disk

  • PostgreSQL data disk

  • MSSQL data disk

To restore a disk:

Important

Restoring a disk creates a new disk object from the incremental snapshots. You will need to specify where to create the disk and its name. You will also need to attach the disk to any virtual machines which should use it and enroll the new disk into the backup system.

  • In the Azure portal select Subscriptions then navigate to the subscription containing the relevant SRE

  • Search for the resource group: RG_SHM_<SHM ID>_SRE_<SRE ID>_BACKUP, then click on the storage account called: `bv--sre-

  • Click Backup instances under Manage in the left-hand menu

  • Ensure that the Datasource type filter is set to Azure Disks

    Selecting disk backup instances
  • Click on the disk to restore

  • Click Restore

  • Click Select restore point to choose which snapshot to revert to and click Select. By default only snapshots from the last 30 days are displayed but this can be adjusted

  • Click Next: Restore Parameters

  • Enter the subscription and resource group in which to create the new disk; these should match the original disk

  • Enter a name for the new disk and click Validate

    Configuring and validating disk backup
  • Click on Next: Review + restore

  • Click on Restore

  • Wait for the restoration to finish. You can monitor the progress on the backup instance page on the Azure portal

    ../../_images/backup_progress_disk_1.png ../../_images/backup_progress_disk_2.png ../../_images/backup_progress_disk_3.png
  • Navigate to the resource group where the new disk has been created

  • Select the virtual machine that the old disk is attached to and click Disks in the left-hand menu

  • Take note of the old disks LUN

  • Remove the old disk by clicking the ‘X’ at the right-hand side of the disk table

  • Click Save

  • Click Attach existing disks and select the disk you restored

  • Ensure the restored disk has the same ‘LUN’ as the old disk

  • Click Save

    The state before swapping in the restored disk The state after swapping in the restored disk
  • Restart the virtual machine

💿 Enrolling restored disks for backup#

On your deployment machine.

  • Ensure you have the same version of the Data Safe Haven repository as was used by your deployment team

  • Open a Powershell terminal and navigate to the deployment/administration directory within the Data Safe Haven repository

  • Ensure you are logged into Azure within Powershell using the command: Connect-AzAccount. This command will give you a URL and a short alphanumeric code. You will need to visit that URL in a web browser and enter the code

  • NB. If your account is a guest in additional Azure tenants, you may need to add the -Tenant <Tenant ID> flag, where <Tenant ID> is the ID of the Azure tenant you want to deploy into

  • Note the name of the restored disk and the name of the resource group it belongs to

  • Run the following script subsituting and with the names of the resource group and disk respectively:

    ./SRE_Enroll_Disk_Backup.ps1 -shmId <SHM ID> -sreId <SRE ID> -resourceGroup
    <resource group name> -diskName <disk name>
    

📦 Updating allowed repository packages#

For a Tier 3 SRE, only the packages named in the allowlists at environment_configs/package_lists/ can be installed by users.

To update the allowlists on an SHM, you should use the SHM_Package_Repository_Update_Allowlists.ps1 script.

PS> /deployment/administration/SHM_Package_Repository_Update_Allowlists.ps1 -shmId <SHM ID>

By default, this script will use the allowlists present in environment_configs/package_lists/ but you may use the -allowlistDirectory option to specify another directory containing the allowlists. It is assumed that the allowlists will have the same names as those in in environment_configs/package_lists/.