API Reference#

class datashuttle.DataShuttle(project_name, print_startup_message=True)[source]#

DataShuttle is a tool for neuroscience project management and data transfer.

Methods

check_name_formatting(names, prefix)

Format a list of subject or session names.

create_folders(top_level_folder, sub_names)

Create a folder tree in the project folder.

download_custom(top_level_folder, sub_names, ...)

Download data from the central project to the local project folder.

download_derivatives([...])

Download all files in the derivatives top level folder.

download_entire_project([...])

Download the entire project.

download_rawdata([overwrite_existing_files, ...])

Download all files in the rawdata top level folder.

download_specific_folder_or_file(filepath[, ...])

Download a specific file or folder.

get_central_path()

Return the project central path.

get_config_path()

Return the full path to the DataShuttle config file.

get_configs()

Return the datashuttle configs.

get_datashuttle_path()

Return the path to the local datashuttle folder.

get_existing_projects()

Return a list of existing project names found on the local machine.

get_local_path()

Return the projects local path.

get_logging_path()

Return the path where datashuttle logs are written.

get_name_templates()

Return the regexp templates used for validation.

get_next_ses(top_level_folder, sub[, ...])

Return the next session number.

get_next_sub(top_level_folder[, ...])

Return the next subject number.

is_local_project()

Return a bool indicating whether the project is 'local only'.

make_config_file(local_path[, central_path, ...])

Initialize the configurations for datashuttle on the local machine.

set_name_templates(new_name_templates)

Update the persistent settings with new name templates.

setup_ssh_connection()

Set up a connection to the central server using SSH.

show_configs()

Print the current configs to the terminal.

update_config_file(**kwargs)

Update the configuration file.

upload_custom(top_level_folder, sub_names, ...)

Upload data from a local project to the central project folder.

upload_derivatives([...])

Upload all files in the derivatives top level folder.

upload_entire_project([...])

Upload the entire project.

upload_rawdata([overwrite_existing_files, ...])

Upload all files in the rawdata top level folder.

upload_specific_folder_or_file(filepath[, ...])

Upload a specific file or folder.

validate_project(top_level_folder, display_mode)

Perform validation on the project.

write_public_key(filepath)

Save the public SSH key to a specified filepath.

create_folders(top_level_folder, sub_names, ses_names=None, datatype='', bypass_validation=False, log=True)[source]#

Create a folder tree in the project folder.

The passed names are initially formatted and validated, then folders are created.

Parameters:
  • top_level_folder (Literal['rawdata', 'derivatives']) – Whether to make the folders within rawdata or derivatives.

  • sub_names (Union[str, List[str]]) – subject name / list of subject names to make within the top-level project folder (if not already, these will be prefixed with “sub-“)

  • ses_names (Union[str, List[str], None]) – session name / list of session names. (if not already, these will be prefixed with “ses-“). If no session is provided, no session-level folders are made.

  • datatype (Union[str, List[str]]) – The datatype to make in the sub / ses folders. (e.g. “ephys”, “behav”, “anat”). If “” is passed no datatype will be created. Broad or Narrow NeuroBlueprint datatypes are accepted.

  • bypass_validation (bool) – If True, folders will be created even if they are not valid to NeuroBlueprint style.

  • log (bool) – If True, details of folder creation will be logged.

Returns:

A dictionary of the full filepaths made during folder creation, where the keys are the type of folder made and the values are a list of created folder paths (Path objects). If datatype were created, the dict keys will separate created folders by datatype name. Similarly, if only subject or session level folders were created, these are separated by “sub” and “ses” keys.

Return type:

created_paths

Notes

sub_names or ses_names may contain formatting tags

@TO@

used to make a range of subjects / sessions. Boundaries of the range must be either side of the tag e.g. sub-001@TO@003 will generate [“sub-001”, “sub-002”, “sub-003”]

@DATE@, @TIME@ @DATETIME@

will add date-<value>, time-<value> or date-<value>_time-<value> keys respectively. Only one per-name is permitted. e.g. sub-001_@DATE@ will generate sub-001_date-20220101 (on the 1st january, 2022).

Examples

project.create_folders(“rawdata”, “sub-001”, datatype=”behav”)

project.create_folders(“rawdata”, “sub-002@TO@005”, [“ses-001”, “ses-002”], [“ephys”, “behav”])

upload_custom(top_level_folder, sub_names, ses_names, datatype='all', overwrite_existing_files='never', dry_run=False, init_log=True)[source]#

Upload data from a local project to the central project folder.

Parameters:
  • top_level_folder (Literal['rawdata', 'derivatives']) – The top-level folder (e.g. “rawdata”, “derivatives”) to transfer within.

  • sub_names (Union[str, list]) – A subject name / list of subject names. These must be prefixed with "sub-", or the prefix will be automatically added. "@*@" can be used as a wildcard. “all” will search for all sub-folders in the datatype folder to upload.

  • ses_names (Union[str, list]) – A session name / list of session names, similar to sub_names but requiring a "ses-" prefix.

  • datatype (Union[List[str], str]) – The (broad or narrow) NeuroBlueprint datatypes to transfer. If "all", any broad or narrow datatype folder will be transferred.

  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If "never" files on target will never be overwritten by source. If "always" files on target will be overwritten by source if there is any difference in date or size. If "if_source_newer" files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.

  • init_log (bool) – Whether to handle logging. This should always be True, unless logger is handled elsewhere (e.g. in a calling function).

Return type:

None

download_custom(top_level_folder, sub_names, ses_names, datatype='all', overwrite_existing_files='never', dry_run=False, init_log=True)[source]#

Download data from the central project to the local project folder.

Parameters:
  • top_level_folder (Literal['rawdata', 'derivatives']) – The top-level folder (e.g. “rawdata”, “derivatives”) to transfer within.

  • sub_names (Union[str, list]) – A subject name / list of subject names. These must be prefixed with "sub-", or the prefix will be automatically added. "@*@" can be used as a wildcard. “all” will search for all sub-folders in the datatype folder to upload.

  • ses_names (Union[str, list]) – A session name / list of session names, similar to sub_names but requiring a "ses-" prefix.

  • datatype (Union[List[str], str]) – The (broad or narrow) NeuroBlueprint datatypes to transfer. If "all", any broad or narrow datatype folder will be transferred.

  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If "never" files on target will never be overwritten by source. If "always" files on target will be overwritten by source if there is any difference in date or size. If "if_source_newer" files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.

  • init_log (bool) – Whether to handle logging. This should always be True, unless logger is handled elsewhere (e.g. in a calling function).

Return type:

None

upload_rawdata(overwrite_existing_files='never', dry_run=False)[source]#

Upload all files in the rawdata top level folder.

Parameters:
  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If "never" files on target will never be overwritten by source. If "always" files on target will be overwritten by source if there is any difference in date or size. If "if_source_newer" files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.

Return type:

None

upload_derivatives(overwrite_existing_files='never', dry_run=False)[source]#

Upload all files in the derivatives top level folder.

Parameters:
  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If "never" files on target will never be overwritten by source. If "always" files on target will be overwritten by source if there is any difference in date or size. If "if_source_newer" files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.

Return type:

None

download_rawdata(overwrite_existing_files='never', dry_run=False)[source]#

Download all files in the rawdata top level folder.

Parameters:
  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If "never" files on target will never be overwritten by source. If "always" files on target will be overwritten by source if there is any difference in date or size. If "if_source_newer" files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved..

Return type:

None

download_derivatives(overwrite_existing_files='never', dry_run=False)[source]#

Download all files in the derivatives top level folder.

Parameters:
  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If "never" files on target will never be overwritten by source. If "always" files on target will be overwritten by source if there is any difference in date or size. If "if_source_newer" files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.

Return type:

None

upload_entire_project(overwrite_existing_files='never', dry_run=False)[source]#

Upload the entire project.

Includes every top level folder (e.g. rawdata, derivatives).

Parameters:
  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If "never" files on target will never be overwritten by source. If "always" files on target will be overwritten by source if there is any difference in date or size. If "if_source_newer" files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.

Return type:

None

download_entire_project(overwrite_existing_files='never', dry_run=False)[source]#

Download the entire project.

Includes every top level folder (e.g. rawdata, derivatives).

Parameters:
  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If "never" files on target will never be overwritten by source. If "always" files on target will be overwritten by source if there is any difference in date or size. If "if_source_newer" files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.

Return type:

None

upload_specific_folder_or_file(filepath, overwrite_existing_files='never', dry_run=False)[source]#

Upload a specific file or folder.

If transferring a single file, the path including the filename is required (see ‘filepath’ input). If a folder, wildcards “*” or “**” must be used to transfer all files in the folder (“*”) or all files and sub-folders (“**”).

Parameters:
  • filepath (Union[str, Path]) – a string containing the full filepath.

  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If "never" files on target will never be overwritten by source. If "always" files on target will be overwritten by source if there is any difference in date or size. If "if_source_newer" files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.

Return type:

None

download_specific_folder_or_file(filepath, overwrite_existing_files='never', dry_run=False)[source]#

Download a specific file or folder.

If transferring a single file, the path including the filename is required (see ‘filepath’ input). If a folder, wildcards “*” or “**” must be used to transfer all files in the folder (“*”) or all files and sub-folders (“**”).

Parameters:
  • filepath (Union[str, Path]) – a string containing the full filepath.

  • overwrite_existing_files (Literal['never', 'always', 'if_source_newer']) – If "never" files on target will never be overwritten by source. If "always" files on target will be overwritten by source if there is any difference in date or size. If "if_source_newer" files on target will only be overwritten by files on source with newer creation / modification datetime.

  • dry_run (bool) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.

Return type:

None

setup_ssh_connection()[source]#

Set up a connection to the central server using SSH.

Assumes the central_host_id and central_host_username are set in configs (see make_config_file() and update_config_file()). First, the server key will be displayed, requiring verification of the server ID. This will store the hostkey for all future use.

Next, prompt to input their password for the central cluster. Once input, SSH private / public key pair will be setup.

Return type:

None

write_public_key(filepath)[source]#

Save the public SSH key to a specified filepath.

By default, only the SSH private key is stored in the datashuttle configs folder. Use this function to save the public key.

Parameters:

filepath (str) – Full filepath (including filename) to write the public key to.

Return type:

None

make_config_file(local_path, central_path=None, connection_method=None, central_host_id=None, central_host_username=None)[source]#

Initialize the configurations for datashuttle on the local machine.

Once initialised, these settings will be used each time the datashuttle is opened.

These settings are stored in a config file on the datashuttle path (not in the project folder) on the local machine. Use get_config_path() to get the full path to the saved config file.

Use update_config_file() to selectively update settings.

Parameters:
  • local_path (str) – path to project folder on local machine

  • central_path (str | None) – Filepath to central project. If this is local (i.e. connection_method = "local_filesystem"), this is the full path on the local filesystem Otherwise, if this is via ssh (i.e. connection method = "ssh"), this is the path to the project folder on central machine. This should be a full path to central folder i.e. this cannot include ~ home folder syntax, must contain the full path (e.g. /nfs/nhome/live/jziminski)

  • connection_method (str | None) – The method used to connect to the central project filesystem, e.g. "local_filesystem" (e.g. mounted drive) or "ssh"

  • central_host_id (Optional[str]) – server address for central host for ssh connection e.g. "ssh.swc.ucl.ac.uk"

  • central_host_username (Optional[str]) – username for which to log in to central host. e.g. "jziminski"

Return type:

None

update_config_file(**kwargs)[source]#

Update the configuration file.

Return type:

None

get_local_path()[source]#

Return the projects local path.

Return type:

Path

get_central_path()[source]#

Return the project central path.

Return type:

Path

get_datashuttle_path()[source]#

Return the path to the local datashuttle folder.

This is where configs and other datashuttle files are stored.

Return type:

Path

get_config_path()[source]#

Return the full path to the DataShuttle config file.

Return type:

Path

get_configs()[source]#

Return the datashuttle configs.

Return type:

Configs

get_logging_path()[source]#

Return the path where datashuttle logs are written.

Return type:

Path

static get_existing_projects()[source]#

Return a list of existing project names found on the local machine.

This is based on project folders in the “home / .datashuttle” folder that contain valid config.yaml files.

Return type:

List[Path]

get_next_sub(top_level_folder, return_with_prefix=True, include_central=False)[source]#

Return the next subject number.

Parameters:
  • top_level_folder (Literal['rawdata', 'derivatives']) – The top-level folder, “rawdata” or “derivatives”.

  • return_with_prefix (bool) – If True, return the subject with the “sub-” prefix.

  • include_central (bool) – If False, only get names from `local_path, otherwise from local_path and central_path. If in local-project mode, this flag is ignored.

Return type:

The next subject ID.

get_next_ses(top_level_folder, sub, return_with_prefix=True, include_central=False)[source]#

Return the next session number.

Parameters:
  • top_level_folder (Literal['rawdata', 'derivatives']) – The top-level folder, “rawdata” or “derivatives”.

  • sub (str) – Name of the subject to find the next session of.

  • return_with_prefix (bool) – If True, return with the “ses-” prefix.

  • include_central (bool) – If False, only get names from local_path, otherwise from local_path and central_path. If in local-project mode, this flag is ignored.

Return type:

The next session ID.

is_local_project()[source]#

Return a bool indicating whether the project is ‘local only’.

A project is ‘local-only’ if it has no central_path and connection_method. It can be used to make folders and validate, but not for transfer.

Return type:

bool

get_name_templates()[source]#

Return the regexp templates used for validation.

If the “on” key is set to False, template validation is not performed.

Returns:

e.g. {“name_templates”: {“on”: False, “sub”: None, “ses”: None}}

Return type:

name_templates

set_name_templates(new_name_templates)[source]#

Update the persistent settings with new name templates.

Name templates are regexp for that, when name_templates["on"] is set to True, "sub" and "ses" names are validated against the regexp contained in the dict.

Parameters:

new_name_templates (Dict) – e.g. {"name_templates": {"on": False, "sub": None, "ses": None}} where "sub" or "ses" can be a regexp that subject and session names respectively are validated against.

Return type:

None

show_configs()[source]#

Print the current configs to the terminal.

Return type:

None

validate_project(top_level_folder, display_mode, include_central=False, strict_mode=False)[source]#

Perform validation on the project.

This checks the subject and session level folders to ensure there are no NeuroBlueprint formatting issues.

Parameters:
  • top_level_folder (Optional[Literal['rawdata', 'derivatives']]) – Folder to check, either "rawdata" or "derivatives". If None, will check both folders.

  • display_mode (Literal['error', 'warn', 'print']) – The validation issues are displayed as "error" (raise error) "warn" (show warning) or "print"

  • include_central (bool) – If False, only the local project is validated. Otherwise, both local and central projects are validated. If in local-project mode, this flag is ignored.

  • strict_mode (bool) – If True, only allow NeuroBlueprint-formatted folders to exist in the project. By default, non-NeuroBlueprint folders (e.g. a folder called ‘my_stuff’ in the ‘rawdata’) are allowed, and only folders starting with sub- or ses- prefix are checked. In Strict Mode, any folder not prefixed with sub-, ses- or a valid datatype will raise a validation issue.

Returns:

A list of validation errors found in the project.

Return type:

error_messages

static check_name_formatting(names, prefix)[source]#

Format a list of subject or session names.

Pass list of names to check how these will be auto-formatted, for example as when passed to create_folders() or upload_custom()

Useful for checking tags e.g. @TO@, @DATE@, @DATETIME@, @DATE@. This method will print the formatted list of names.

Parameters:
  • names (Union[str, list]) – A string or list of subject or session names.

  • prefix (Literal['sub', 'ses']) – The relevant subject or session prefix, e.g. "sub-" or "ses-"

Return type:

None

datashuttle.quick_validate_project(project_path, top_level_folder='rawdata', display_mode='warn', strict_mode=False, name_templates=None)[source]#

Perform validation on a NeuroBlueprint-formatted project.

Parameters:
  • project_path (str | Path) – Path to the project to validate. Must include the project name, and hold a “rawdata” or “derivatives” folder.

  • top_level_folder (Optional[Literal['rawdata', 'derivatives']]) – The top-level folder (“rawdata” or “derivatives”) to perform validation. If None, both are checked.

  • display_mode (Literal['error', 'warn', 'print']) – The validation issues are displayed as "error" (raise error), "warn" (show warning), or "print".

  • strict_mode (bool) – If True, only allow NeuroBlueprint-formatted folders to exist in the project. By default, non-NeuroBlueprint folders (e.g. a folder called ‘my_stuff’ in the ‘rawdata’) are allowed, and only folders starting with sub- or ses- prefix are checked. In Strict Mode, any folder not prefixed with sub-, ses- or a valid datatype will raise a validation issue.

  • name_templates (Optional[Dict]) – A dictionary of templates for subject and session name to validate against. See DataShuttle.set_name_templates() for details.

Returns:

A list of validation errors found in the project.

Return type:

error_messages