API Reference#

class datashuttle.datashuttle.DataShuttle(project_name, print_startup_message=True)[source]#

DataShuttle is a tool for convenient scientific project management and data transfer in BIDS format.

The expected organisation is a central repository on a central machine (‘central’) that contains all project data. This is connected to multiple local machines (‘local’). These can each contain a subset of the full project (e.g. machine for electrophysiology collection, machine for behavioural collection).

On first use on a new profile, show warning prompting to set configurations with the function make_config_file().

Datashuttle will save logs to a .datashuttle folder in the main local project. These logs contain detailed information on folder creation / transfer. To get the path to datashuttle logs, use cfgs.make_and_get_logging_path().

For transferring data between a central data storage with SSH, use setup setup_ssh_connection(). This will allow you to check the server key, add host key to profile if accepted, and setup ssh key pair.

Parameters:
  • project_name (The project name to use the datashuttle) – Folders containing all project files and folders are specified in make_config_file(). Datashuttle-related files are stored in a .datashuttle folder in the user home folder. Use get_datashuttle_path() to see the path to this folder.

  • print_startup_message (If True, a start-up message displaying the) – current state of the program (e.g. persistent settings such as the ‘top-level folder’) is shown.

Methods

check_name_formatting(names, prefix)

Pass list of names to check how these will be auto-formatted, for example as when passed to create_folders() or upload_custom() or download()

create_folders(top_level_folder, sub_names)

Create a subject / session folder tree in the project folder.

download_custom(top_level_folder, sub_names, ...)

Download data from the central project folder to the local project folder.

download_derivatives([...])

Download files in the derivatives top level folder.

download_entire_project([...])

Download the entire project (from 'central' to 'local'), i.e. including every top level folder (e.g. 'rawdata', 'derivatives', 'code', 'analysis').

download_rawdata([overwrite_existing_files, ...])

Download files in the rawdata top level folder.

download_specific_folder_or_file(filepath[, ...])

Download a specific file or folder.

get_central_path()

Get the project central path.

get_config_path()

Get the full path to the DataShuttle config file.

get_datashuttle_path()

Get the path to the local datashuttle folder where configs and other datashuttle files are stored.

get_existing_projects()

Get a list of existing project names found on the local machine.

get_local_path()

Get the projects local path.

get_logging_path()

Get the path where datashuttle logs are written.

get_name_templates()

Get the regexp templates used for validation.

get_next_ses(top_level_folder, sub[, ...])

Convenience function for get_next_sub_or_ses to find the next session number.

get_next_sub(top_level_folder[, ...])

Convenience function for get_next_sub_or_ses to find the next subject number.

make_config_file(local_path, central_path, ...)

Initialise the configurations for datashuttle to use on the local machine.

set_name_templates(new_name_templates)

Update the persistent settings with new name templates.

setup_ssh_connection()

Setup a connection to the central server using SSH.

show_configs()

Print the current configs to the terminal.

upload_custom(top_level_folder, sub_names, ...)

Upload data from a local project to the central project folder.

upload_derivatives([...])

Upload files in the derivatives top level folder.

upload_entire_project([...])

Upload the entire project (from 'local' to 'central'), i.e. including every top level folder (e.g. 'rawdata', 'derivatives', 'code', 'analysis').

upload_rawdata([overwrite_existing_files, ...])

Upload files in the rawdata top level folder.

upload_specific_folder_or_file(filepath[, ...])

Upload a specific file or folder.

validate_project(top_level_folder, error_or_warn)

Perform validation on the project. This checks the subject and session level folders to ensure that: - the digit lengths are consistent (e.g. 'sub-001' with 'sub-02' is not allowed) - 'sub-' or 'ses-' is the first key of the sub / ses names - names online include integers, letters, dash or underscore - names are checked against name templates (if set) - no duplicate names exist across the project (e.g. 'sub-001' and 'sub-001_date-1010120').

write_public_key(filepath)

By default, the SSH private key only is stored, in the datashuttle configs folder.

get_configs

update_config_file

create_folders(top_level_folder, sub_names, ses_names=None, datatype='', bypass_validation=False, log=True)[source]#

Create a subject / session folder tree in the project folder. The passed subject / session names are formatted and validated. If this succeeds, fully validation against all subject / session folders in the local project is performed before making the folders.

Parameters:
  • top_level_folder (TopLevelFolder) – Whether to make the folders in rawdata or derivatives.

  • sub_names (Union[str, List[str]]) – subject name / list of subject names to make within the top-level project folder (if not already, these will be prefixed with “sub-“)

  • ses_names (Optional[Union[str, List[str]]]) – (Optional). session name / list of session names. (if not already, these will be prefixed with “ses-“). If no session is provided, no session-level folders are made.

  • datatype (Union[str, List[str]]) – The datatype to make in the sub / ses folders. (e.g. “ephys”, “behav”, “anat”). If “all” is selected, all datatypes permitted in NeuroBlueprint will be created. If “” is passed no datatype will be created.

  • bypass_validation (bool) – If True, folders will be created even if they are not valid to NeuroBlueprint style.

  • log (bool) – If True, details of folder creation will be logged.

Returns:

A dictionary of the full filepaths made during folder creation, where the keys are the type of folder made and the values are a list of created folder paths (Path objects). If datatype were created, the dict keys will separate created folders by datatype name. Similarly, if only subject or session level folders were created, these are separated by “sub” and “ses” keys.

Return type:

created_paths

Notes

sub_names or ses_names may contain formatting tags

@TO@ :

used to make a range of subjects / sessions. Boundaries of the range must be either side of the tag e.g. sub-001@TO@003 will generate

[“sub-001”, “sub-002”, “sub-003”]

@DATE@, @TIME@ @DATETIME@ :

will add date-<value>, time-<value> or date-<value>_time-<value> keys respectively. Only one per-name is permitted. e.g. sub-001_@DATE@ will generate sub-001_date-20220101 (on the 1st january, 2022).

Examples

project.create_folders(“rawdata”, “sub-001”, datatype=”all”)

project.create_folders(“rawdata”,

“sub-002@TO@005”, [“ses-001”, “ses-002”], [“ephys”, “behav”])

upload_custom(top_level_folder, sub_names, ses_names, datatype='all', overwrite_existing_files='never', dry_run=False, init_log=True)[source]#

Upload data from a local project to the central project folder. In the case that a file / folder exists on the central and local, the central will not be overwritten even if the central file is an older version. Data transfer logs are saved to the logging folder).

Parameters:
  • top_level_folder (Literal['rawdata', 'derivatives']) – The top-level folder (e.g. rawdata) to transfer files and folders within.

  • sub_names (Union[str, list]) – a subject name / list of subject names. These must be prefixed with “sub-”, or the prefix will be automatically added. “@*@” can be used as a wildcard. “all” will search for all sub-folders in the datatype folder to upload.

  • ses_names (Union[str, list]) – a session name / list of session names, similar to sub_names but requiring a “ses-” prefix.

  • datatype (Union[List[str], str]) – see create_folders()

  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If False, files on central will never be overwritten by files transferred from local. If True, central files will be overwritten if there is any difference (date, size) between central and local files.

  • dry_run (bool) – perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved. Useful to check which files will be moved on data transfer.

  • init_log (bool) – (Optional). Whether to handle logging. This should always be True, unless logger is handled elsewhere (e.g. in a calling function).

Return type:

None

download_custom(top_level_folder, sub_names, ses_names, datatype='all', overwrite_existing_files='never', dry_run=False, init_log=True)[source]#

Download data from the central project folder to the local project folder.

Parameters:
  • top_level_folder (Literal['rawdata', 'derivatives']) – The top-level folder (e.g. rawdata) to transfer files and folders within.

  • sub_names (Union[str, list]) – a subject name / list of subject names. These must be prefixed with “sub-”, or the prefix will be automatically added. “@*@” can be used as a wildcard. “all” will search for all sub-folders in the datatype folder to upload.

  • ses_names (Union[str, list]) – a session name / list of session names, similar to sub_names but requiring a “ses-” prefix.

  • datatype (Union[List[str], str]) – see create_folders()

  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If “never” files on target will never be overwritten by source. If “always” files on target will be overwritten by source if there is any difference in date or size. If “if_source_newer” files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved. Useful to check which files will be moved on data transfer.

  • init_log (bool) – (Optional). Whether to handle logging. This should always be True, unless logger is handled elsewhere (e.g. in a calling function).

Return type:

None

upload_rawdata(overwrite_existing_files='never', dry_run=False)[source]#

Upload files in the rawdata top level folder.

Parameters:
  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If “never” files on target will never be overwritten by source. If “always” files on target will be overwritten by source if there is any difference in date or size. If “if_source_newer” files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved. Useful to check which files will be moved on data transfer.

upload_derivatives(overwrite_existing_files='never', dry_run=False)[source]#

Upload files in the derivatives top level folder.

Parameters:
  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If “never” files on target will never be overwritten by source. If “always” files on target will be overwritten by source if there is any difference in date or size. If “if_source_newer” files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved. Useful to check which files will be moved on data transfer.

download_rawdata(overwrite_existing_files='never', dry_run=False)[source]#

Download files in the rawdata top level folder.

Parameters:
  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If “never” files on target will never be overwritten by source. If “always” files on target will be overwritten by source if there is any difference in date or size. If “if_source_newer” files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved. Useful to check which files will be moved on data transfer.

download_derivatives(overwrite_existing_files='never', dry_run=False)[source]#

Download files in the derivatives top level folder.

Parameters:
  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If “never” files on target will never be overwritten by source. If “always” files on target will be overwritten by source if there is any difference in date or size. If “if_source_newer” files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved. Useful to check which files will be moved on data transfer.

upload_entire_project(overwrite_existing_files='never', dry_run=False)[source]#

Upload the entire project (from ‘local’ to ‘central’), i.e. including every top level folder (e.g. ‘rawdata’, ‘derivatives’, ‘code’, ‘analysis’).

Parameters:
  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If “never” files on target will never be overwritten by source. If “always” files on target will be overwritten by source if there is any difference in date or size. If “if_source_newer” files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved. Useful to check which files will be moved on data transfer.

Return type:

None

download_entire_project(overwrite_existing_files='never', dry_run=False)[source]#

Download the entire project (from ‘central’ to ‘local’), i.e. including every top level folder (e.g. ‘rawdata’, ‘derivatives’, ‘code’, ‘analysis’).

Parameters:
  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If “never” files on target will never be overwritten by source. If “always” files on target will be overwritten by source if there is any difference in date or size. If “if_source_newer” files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved. Useful to check which files will be moved on data transfer.

Return type:

None

upload_specific_folder_or_file(filepath, overwrite_existing_files='never', dry_run=False)[source]#

Upload a specific file or folder. If transferring a single file, the path including the filename is required (see ‘filepath’ input). If a folder, wildcards “*” or “**” must be used to transfer all files in the folder (“*”) or all files and sub-folders (“**”).

Parameters:
  • filepath (Union[str, Path]) – a string containing the full filepath.

  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If “never” files on target will never be overwritten by source. If “always” files on target will be overwritten by source if there is any difference in date or size. If “if_source_newer” files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved. Useful to check which files will be moved on data transfer.

Return type:

None

download_specific_folder_or_file(filepath, overwrite_existing_files='never', dry_run=False)[source]#

Download a specific file or folder. If transferring a single file, the path including the filename is required (see ‘filepath’ input). If a folder, wildcards “*” or “**” must be used to transfer all files in the folder (“*”) or all files and sub-folders (“**”).

Parameters:
  • filepath (Union[str, Path]) – a string containing the full filepath.

  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If “never” files on target will never be overwritten by source. If “always” files on target will be overwritten by source if there is any difference in date or size. If “if_source_newer” files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved. Useful to check which files will be moved on data transfer.

Return type:

None

setup_ssh_connection()[source]#

Setup a connection to the central server using SSH. Assumes the central_host_id and central_host_username are set in configs (see make_config_file() and update_config_file())

First, the server key will be displayed, requiring verification of the server ID. This will store the hostkey for all future use.

Next, prompt to input their password for the central cluster. Once input, SSH private / public key pair will be setup.

Return type:

None

write_public_key(filepath)[source]#

By default, the SSH private key only is stored, in the datashuttle configs folder. Use this function to save the public key.

Parameters:

filepath (str) – full filepath (inc filename) to write the public key to.

Return type:

None

make_config_file(local_path, central_path, connection_method, central_host_id=None, central_host_username=None)[source]#

Initialise the configurations for datashuttle to use on the local machine. Once initialised, these settings will be used each time the datashuttle is opened. This method can also be used to completely overwrite existing configs.

These settings are stored in a config file on the datashuttle path (not in the project folder) on the local machine. Use get_config_path() to get the full path to the saved config file.

Use update_config_file() to selectively update settings.

Parameters:
  • local_path (str) – path to project folder on local machine

  • central_path (str) – Filepath to central project. If this is local (i.e. connection_method = “local_filesystem”), this is the full path on the local filesystem Otherwise, if this is via ssh (i.e. connection method = “ssh”), this is the path to the project folder on central machine. This should be a full path to central folder i.e. this cannot include ~ home folder syntax, must contain the full path (e.g. /nfs/nhome/live/jziminski)

  • connection_method (str) – The method used to connect to the central project filesystem, e.g. “local_filesystem” (e.g. mounted drive) or “ssh”

  • central_host_id (Optional[str]) – server address for central host for ssh connection e.g. “ssh.swc.ucl.ac.uk”

  • central_host_username (Optional[str]) – username for which to log in to central host. e.g. “jziminski”

Return type:

None

update_config_file(**kwargs)[source]#
Return type:

None

get_local_path()[source]#

Get the projects local path.

Return type:

Path

get_central_path()[source]#

Get the project central path.

Return type:

Path

get_datashuttle_path()[source]#

Get the path to the local datashuttle folder where configs and other datashuttle files are stored.

Return type:

Path

get_config_path()[source]#

Get the full path to the DataShuttle config file.

Return type:

Path

get_configs()[source]#
Return type:

Configs

get_logging_path()[source]#

Get the path where datashuttle logs are written.

Return type:

Path

static get_existing_projects()[source]#

Get a list of existing project names found on the local machine. This is based on project folders in the “home / .datashuttle” folder that contain valid config.yaml files.

Return type:

List[Path]

get_next_sub(top_level_folder, return_with_prefix=True, local_only=False)[source]#

Convenience function for get_next_sub_or_ses to find the next subject number.

Parameters:
  • return_with_prefix (bool) – If True, return with the “sub-” prefix.

  • local_only (bool) – If True, only get names from `local_path, otherwise from local_path and central_path.

Return type:

str

get_next_ses(top_level_folder, sub, return_with_prefix=True, local_only=False)[source]#

Convenience function for get_next_sub_or_ses to find the next session number.

Parameters:
  • top_level_folder (Literal['rawdata', 'derivatives']) – “rawdata” or “derivatives”

  • sub (Optional[str]) – Name of the subject to find the next session of.

  • return_with_prefix (bool) – If True, return with the “ses-” prefix.

  • local_only (bool) – If True, only get names from `local_path, otherwise from local_path and central_path.

Return type:

str

get_name_templates()[source]#

Get the regexp templates used for validation. If the “on” key is set to False, template validation is not performed.

Return type:

Dict

Returns:

  • name_templates (Dict) – e.g. {“name_templates”: {“on”: False, “sub”: None, “ses”: None}}

set_name_templates(new_name_templates)[source]#

Update the persistent settings with new name templates.

Name templates are regexp for that, when name_templates[“on”] is set to True, “sub” and “ses” names are validated against the regexp contained in the dict.

Parameters:

new_name_templates (Dict) – e.g. {“name_templates”: {“on”: False, “sub”: None, “ses”: None}} where “sub” or “ses” can be a regexp that subject and session names respectively are validated against.

Return type:

None

show_configs()[source]#

Print the current configs to the terminal.

Return type:

None

validate_project(top_level_folder, error_or_warn, local_only=False)[source]#

Perform validation on the project. This checks the subject and session level folders to ensure that:

  • the digit lengths are consistent (e.g. ‘sub-001’ with ‘sub-02’ is not allowed)

  • ‘sub-’ or ‘ses-’ is the first key of the sub / ses names

  • names online include integers, letters, dash or underscore

  • names are checked against name templates (if set)

  • no duplicate names exist across the project (e.g. ‘sub-001’ and ‘sub-001_date-1010120’).

Parameters:
  • error_or_warn (Literal["error", "warn"]) – If “error”, an exception is raised if validation fails. Otherwise, warnings are shown.

  • local_only (bool) – If True, only the local project is validated. Otherwise, both local and central projects are validated.

Return type:

None

static check_name_formatting(names, prefix)[source]#

Pass list of names to check how these will be auto-formatted, for example as when passed to create_folders() or upload_custom() or download()

Useful for checking tags e.g. @TO@, @DATE@, @DATETIME@, @DATE@. This method will print the formatted list of names,

Parameters:
  • names (Union[str, list]) – A string or list of subject or session names.

  • prefix (Literal['sub', 'ses']) – The relevant subject or session prefix, e.g. “sub-” or “ses-”

Return type:

None