(administrator_manage_data)= # Managing data ingress and egress ```{important} This document assumes that you already have access to a {ref}`Safe Haven Management (SHM) environment ` and one or more {ref}`Secure Research Environments (SREs) ` that are linked to it. ``` (roles_system_manager_data_ingress)= ## Data Ingress It is the data provider's responsibility to upload the data required by the safe haven. ```{important} Any data ingress must be signed off by the {ref}`role_data_provider_representative`, {ref}`role_investigator` and referee (if applicable). ``` The following steps show how to generate a temporary write-only upload token that can be securely sent to the data provider, enabling them to upload the data: - In the Azure portal select `Subscriptions` then navigate to the subscription containing the relevant SHM - Search for the resource group: `RG_SHM__PERSISTENT_DATA`, then click through to the storage account called: `data` (where `` is a random string) - Click `Networking` under `Settings` and paste the data provider's IP address as one of those allowed under the `Firewall` header, then hit the save icon in the top left - From the `Overview` tab, click the link to `Containers` (in the middle of the page) - Click `ingress` - Click `Shared access tokens` under `Settings` and do the following: - Under `Permissions`, check these boxes: - `Write` - `List` - Set a 24 hour time window in the `Start and expiry date/time` (or an appropriate length of time) - Leave everything else as default and click `Generate SAS token and URL` - Copy the `Blob SAS URL` - Send the `Blob SAS URL` to the data provider via secure email (for example, you could use the [Egress secure email](https://www.egress.com/) service) - The data provider should now be able to upload data - You can validate successful data ingress by logging into the SRD for the SRE and checking the `/data` volume, where you should be able to view the data that the data provider has uploaded (roles_system_manager_multiple_providers)= ## Multiple data providers In some projects, there may be more than one data provider responsible for uploading data. Two potential issues that may occur are _name clashes_ and _data leakage_. If all data providers are uploading to the same storage container, then name clashes may occur. There is no protection against overwriting files during upload. Thus, if more than one data provider uploads files with the same path, then the earlier upload will be overwritten. This can be avoided by providing each data provider with their own subfolder on the storage container and ensuring that each uploads only to their subfolder. If all data providers are uploading to the same storage container, then they may be able to see the files uploaded by other data providers. Although they will not be able to access or download these files, a potential issue is that sensitive information may be visible in either the file names or directory structure of the uploaded data. If possible, data providers should avoid using any identifying information in the filenames or directory structure of the data that they upload. This is not always possible, since some data providers may require identifying information to be part of filenames or directory structures. An alternative is to provide separate storage containers for upload for each data provider. These containers should have all the same access restrictions as used for a single ingress storage container. After the data has been uploaded, the {ref}`role_system_manager` can transfer the uploaded data to a single storage container accessible to {ref}`role_researcher`s from the relevant SRE, as per the normal data ingress process. The data-provider-specific containers should be deleted once the data has been transferred. (roles_system_manager_software_ingress)= ## Software Ingress Software ingress is performed in a similar manner to data. ```{important} Software ingress must go through the same approval process as is the case for data ingress, including sign-off from the {ref}`role_data_provider_representative`, {ref}`role_investigator` and referee (if applicable). ``` - Follow the same steps as for {ref}`data ingress ` above to provide temporary write access, but set the time window for the SAS token to a shorter period (e.g. several hours) - Share the token with the {ref}`role_investigator`, so they can install software within the time window - The {ref}`role_investigator` can perform software ingress via `Azure Storage Explorer` (for instance as a zip file) (roles_system_manager_data_egress)= ## Data egress ```{important} Any data egress must be signed off by the {ref}`role_data_provider_representative`, {ref}`role_investigator` and referee (if applicable). ``` ```{important} Assessment of output must be completed **before** an egress link is created. ``` The {ref}`role_system_manager` creates a time-limited and IP restricted link to remove data from the environment, after the outputs have been classified and approved for release. - In the Azure portal select `Subscriptions` then navigate to the subscription containing the relevant SHM - Search for the resource group: `RG_SHM__PERSISTENT_DATA`, then click through to the storage account called: `data` (where `` is a random string) - Click `Networking` under `Settings` to check the list of pre-approved IP addresses allowed under the `Firewall` header - Ensure that the IP address of the person to receive the outputs is listed and enter it if not - Click `Containers` under `Data storage` - Click `egress` - Click `Shared access tokens` under `Settings` and do the following: - Under `Permissions`, check these boxes: - `Read` - `List` - Set a time window in the `Start and expiry date/time` that gives enough time for the person who will perform the secure egress download to do so - Leave everything else as default click `Generate SAS token and URL` ```{image} administrator_guide/read_only_sas_token.png :alt: Read-only SAS token :align: center ``` - Leave this portal window open and move to the next step - The appropriate person should now be able to download data ### The output volume Once you have set up the egress connection in `Azure Storage Explorer`, you should be able to view data from the **output volume**, a read-write area intended for the extraction of results, such as figures for publication. On the SRD, this volume is `/output` and is shared between all SRDs in an SRE. For more info on shared SRE storage volumes, consult the {ref}`Safe Haven User Guide `. ## {{file_cabinet}} Backup ### {{card_file_box}} Restoring blobs Blob containers in backed up storage accounts are protected by [operational backup](https://learn.microsoft.com/en-us/azure/backup/blob-backup-overview#how-the-operational-backup-works). It is possible to restore the state of the blobs to an earlier point in time, up to twelve weeks in the past. The blob containers covered by the protection for each SRE are the - ingress container (mounted at `/data`) - egress container (mounted at `/output`) - backup container (mounted at `/backup`) To restore these containers to a previous point in time: ```{important} Blobs are restored 'in place'. The current state will be overwritten by the point which you restore to. ``` - In the Azure portal select `Subscriptions` then navigate to the subscription containing the relevant SRE - Search for the resource group: `RG_SHM__SRE__BACKUP`, then click on the storage account called: `bv--sre-` - Click `Backup instances` under `Manage` in the left-hand menu - Ensure that the `Datasource type` filter is set to `Azure Blobs (Azure Storage)` ```{image} administrator_guide/backup_instances_blobs.png :alt: Selecting blob backup instances :align: center ``` - Click on the storage-account backup instance - Select a point in the past to restore to and click `Restore` ```{image} administrator_guide/backup_select_restore_time_blobs.png :alt: Selecting blob backup restore point :align: center ``` - Click on `Next: Restore Parameters` - You can now choose whether to restore all, or a subset of the containers. In the example below the 'egress' and 'backup' containers are selected - Click on `Validate` ```{image} administrator_guide/backup_select_containers_validate_blobs.png :alt: Selecting blob containers to restore and validating :align: center ``` - Click on `Next: Review + restore` - Click on `Restore` ### {{optical_disk}} Restoring disks Backed up disks have incremental snapshots taken daily. These snapshots are stored in the backup resource group,`RG_SHM__SRE__BACKUP`. The disks covered by the protection for each SRE are the - GitLab data disk - CodiMD data disk - PostgreSQL data disk - MSSQL data disk To restore a disk: ```{important} Restoring a disk creates a new disk object from the incremental snapshots. You will need to specify where to create the disk and its name. You will also need to attach the disk to any virtual machines which should use it and enroll the new disk into the backup system. ``` - In the Azure portal select `Subscriptions` then navigate to the subscription containing the relevant SRE - Search for the resource group: `RG_SHM__SRE__BACKUP`, then click on the storage account called: `bv--sre- - Click `Backup instances` under `Manage` in the left-hand menu - Ensure that the `Datasource type` filter is set to `Azure Disks` ```{image} administrator_guide/backup_instances_disks.png :alt: Selecting disk backup instances :align: center ``` - Click on the disk to restore - Click `Restore` - Click `Select restore point` to choose which snapshot to revert to and click `Select`. By default only snapshots from the last 30 days are displayed but this can be adjusted - Click `Next: Restore Parameters` - Enter the subscription and resource group in which to create the new disk; these should match the original disk - Enter a name for the new disk and click `Validate` ```{image} administrator_guide/backup_select_snapshot_validate_disks.png :alt: Configuring and validating disk backup :align: center ``` - Click on `Next: Review + restore` - Click on `Restore` - Wait for the restoration to finish. You can monitor the progress on the backup instance page on the Azure portal ```{image} administrator_guide/backup_progress_disk_1.png :align: center ``` ```{image} administrator_guide/backup_progress_disk_2.png :align: center ``` ```{image} administrator_guide/backup_progress_disk_3.png :align: center ``` - Navigate to the resource group where the new disk has been created - Select the virtual machine that the old disk is attached to and click `Disks` in the left-hand menu - Take note of the old disks `LUN` - Remove the old disk by clicking the 'X' at the right-hand side of the disk table - Click `Save` - Click `Attach existing disks` and select the disk you restored - Ensure the restored disk has the same 'LUN' as the old disk - Click `Save` ```{image} administrator_guide/backup_swap_disk_before.png :alt: The state before swapping in the restored disk :align: center ``` ```{image} administrator_guide/backup_swap_disk_after.png :alt: The state after swapping in the restored disk :align: center ``` - Restart the virtual machine ### {{optical_disk}} Enrolling restored disks for backup On your **deployment machine**. - Ensure you have the same version of the Data Safe Haven repository as was used by your deployment team - Open a `Powershell` terminal and navigate to the `deployment/administration` directory within the Data Safe Haven repository - Ensure you are logged into `Azure` within `Powershell` using the command: `Connect-AzAccount`. This command will give you a URL and a short alphanumeric code. You will need to visit that URL in a web browser and enter the code - NB. If your account is a guest in additional Azure tenants, you may need to add the `-Tenant ` flag, where `` is the ID of the Azure tenant you want to deploy into - Note the name of the restored disk and the name of the resource group it belongs to - Run the following script subsituting and with the names of the resource group and disk respectively: ```powershell ./SRE_Enroll_Disk_Backup.ps1 -shmId -sreId -resourceGroup -diskName ``` ## {{package}} Updating allowed repository packages For a {ref}`policy_tier_3` SRE, only the packages named in the allowlists at `environment_configs/package_lists/` can be installed by users. To update the allowlists on an SHM, you should use the `SHM_Package_Repository_Update_Allowlists.ps1` script. ```powershell PS> /deployment/administration/SHM_Package_Repository_Update_Allowlists.ps1 -shmId ``` By default, this script will use the allowlists present in `environment_configs/package_lists/` but you may use the `-allowlistDirectory` option to specify another directory containing the allowlists. It is assumed that the allowlists will have the same names as those in in `environment_configs/package_lists/`.