Using the Secure Research Environment#

๐Ÿง Linux Basics#

If you have never used a Linux computer before, you may find some of the following resources helpful:

๐Ÿ“ฐ Transferring files into or out of the SRE#

Each time a request is made to bring data or software into (โ€œingressโ€) or out of (โ€œegressโ€) the SRE, it needs to be reviewed in case it represents a security risk. These reviews will be coordinated by the designated contact for your SRE. They will have to discuss whether this is an acceptable risk to the data security with the projectโ€™s principle investigator and data provider and the decision might be โ€œnoโ€.

Hint

You can make the process as easy as possible by providing as much information as possible about the code or data. For instance, describing in detail what a dataset contains and how it will be use will help speed up decision making.

โœ‚๏ธ Copy and paste#

It is always possible to use copy and paste as normal within an SRE workspace. However, the ability to copy and paste text to or from an SRE workspace depends on the specific configuration of the SRE. The system manager can configure the SRE workspaces to allow copying text from a workspace, pasting text into a workspace, both, or neither. Copy and paste of anything other than text to or from a workspace is not possible.

๐Ÿ“š Maintaining an archive of the project#

SREs are designed to be ephemeral and only deployed for as long as necessary. It is likely that the infrastructure, and data, will be permanently deleted when work has concluded.

The /mnt/output/ directory is designed for storing output to be kept after a project concludes. You should move such data to the /mnt/output/ directory and contact your designated contact about data egress.

Important

You are responsible for deciding what is worth archiving.

While working on the project:

  • store all your code in a Gitea repository.

  • store all resources that might be useful to the rest of the project in the /mnt/shared/ folder.

  • store anything that might form an output from the project (e.g. images, documents or output datasets) in the /mnt/output/ folder.

See the section on sharing files to find out more about where to store your files.

๐Ÿ“ฆ Pre-installed applications#

The workspace has several pre-installed applications and programming languages to help with your data analysis.

If you need anything that is not already installed, please discuss this with the designated contact for your SRE.

You can access applications from the desktop using either:

  • the Terminal app accessible from the dock at the bottom of the screen

  • via a drop-down menu when you right-click on the desktop or click the Applications button on the top left of the screen

How to access applications from the desktop

A few specific examples are given below.

๐Ÿ‘ฉโ€๐Ÿ’ป VSCodium#

You can start VSCodium from the Applications โ€ฃ Development menu.

Running VSCodium

โซ R and RStudio#

Typing R at the command line will give you a pre-installed version of R.

Running R from a terminal

Or you can use RStudio or VSCodium from the Applications โ€ฃ Development menu.

Running RStudio

๐Ÿ Python and Pycharm#

Typing python at the command line will give you a pre-installed version of Python.

Running Python from a terminal

Or you can use Pycharm from the Applications โ€ฃ Development menu.

Running RStudio

๐ŸŽ Installing software packages#

You have access to packages from the PyPI and CRAN repositories from the SRE. You can install packages you need from these copies in the usual way, for example pip install (Python) and install.packages (R).

Depending on the sensitivity level of your SRE, you may only have access to a subset of R and Python packages:

  • Tier 2 (medium security) environments have access to all packages on PyPI and CRAN.

  • Tier 3 (high security) environments only have pre-authorised packages available.

Tip

If you need to use a package that is not on the allowlist see the section on how to bring software or data into the environment.

Python packages#

Note

You will not have permissions to install packages system-wide. We recommend using a virtual environment.

You can create one:

You can install Python packages into your virtual environment from a terminal.

> pip install NAME_OF_PACKAGE

R packages#

Note

You will not have permissions to install packages system-wide. You will need to use a user package directory.

You can install R packages from inside R (or RStudio):

> install.packages(NAME_OF_PACKAGE)

You will see something like the following:

Installing package into '/usr/local/lib/R/site-library'
(as 'lib' is unspecified)
Warning in install.packages("cluster") :
  'lib = "/usr/local/lib/R/site-library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel)

Type yes, which prompts you to confirm the name of the library:

Would you like to create a personal library
'~/R/x86_64-pc-linux-gnu-library/4.1'
to install packages into? (yes/No/cancel)

Type yes to install the packages.

๐Ÿ“‚ Sharing files inside the SRE#

There are several shared folder on each workspace that all collaborators within a research project team can see and access:

Input data#

Data that has been approved and brought into the secure research environment can be found in the /mnt/input/ folder.

  • The contents of /mnt/input/ will be identical on all workspaces in your SRE.

  • Everyone working on your project will be able to access it.

  • Everyone has read-only access to the files stored here.

If you are using the Data Safe Haven as part of an organised event, you might find additional resources in the /mnt/input/ folder, such as example slides or document templates.

Important

You will not be able to change any of the files in /mnt/input/. If you want to make derived datasets, for example cleaned and reformatted data, please add those to the /mnt/shared/ or /mnt/output/ folders.

Shared space#

The /mnt/shared/ folder should be used for any work that you want to share with your group.

  • The contents of /mnt/shared/ will be identical on all workspaces in your SRE.

  • Everyone working on your project will be able to access it

  • Everyone has read-and-write access to the files stored here.

Output resources#

Any outputs that you want to extract from the secure environment should be placed in the /mnt/output/ folder on the workspace.

  • The contents of /mnt/output/ will be identical on all workspaces in your SRE.

  • Everyone working on your project will be able to access it

  • Everyone has read-and-write access to the files stored here.

Anything placed in here will be considered for data egress - removal from the secure research environment - by the projectโ€™s principal investigator together with the data provider.

Tip

You may want to consider having subfolders of /mnt/output/ to make the review of this directory easier.

๐Ÿ’Š Version control using Gitea#

Gitea[1] is an open-source code hosting platform for version control and collaboration - similar to GitHub. It allows you to use git to version control your work, coordinate tasks using issues and review work using pull requests.

The Gitea server within the SRE can hold code, documentation and results from your teamโ€™s analyses. Use the Gitea server to work collaboratively on code with other project team members.

Important

This Gitea server is entirely within the SRE - you do not need to worry about the security of the information you upload there as it is inaccessible from the public internet.

You can access Gitea from an internet browser in the workspace using the desktop shortcut. Use your short-form username and password to login.

๐Ÿ“– Collaborative writing using HedgeDoc#

HedgeDoc[2] is an open-source document hosting platform for collaboration - similar to HackMD. It uses Markdown[3] which is a simple way to format your text so that it renders nicely in HTML.

The HedgeDoc server within the SRE can hold documents relating to your teamโ€™s analyses. Use the HedgeDoc server to work collaboratively on documents with other project team members.

Important

This HedgeDoc server is entirely within the SRE - you do not need to worry about the security of the information you upload there as it is inaccessible from the public internet.

You can access HedgeDoc from an internet browser from the workspace using the desktop shortcut. Use your short-form username and password to login.

๐Ÿ“— Database access#

Your project might use a database for holding the input data. You might also/instead be provided with a database for use in analysing the data. The database server will use either Microsoft SQL or PostgreSQL.

If you have access to one or more databases, you can access them using the following details, replacing SRE_URL with the SRE URL for your project.

For guidance on how to use the databases, many resources are available on the internet. Official tutorials for MSSQL and PostgreSQL may be good starting points.

Examples are given below for connecting using Beekeeper Studio, Python and R. The instructions for using other graphical interfaces or programming languages will be similar.

๐Ÿ Connecting using Beekeeper Studio#

๐Ÿ Connecting using Python#

Database connections can be made using pyodbc (Microsoft SQL) or psycopg2 (PostgreSQL). The data can be read into a dataframe for local analysis.

๐ŸŒน Connecting using R#

Database connections can be made using odbc (Microsoft SQL) or RPostgres (PostgreSQL). The data can be read into a dataframe for local analysis.