API Reference#
- class datashuttle.DataShuttle(project_name, print_startup_message=True)[source]#
DataShuttle is a tool for neuroscience project management and data transfer.
Methods
check_name_formatting
(names, prefix)Format a list of subject or session names.
create_folders
(top_level_folder, sub_names)Create a folder tree in the project folder.
download_custom
(top_level_folder, sub_names, ...)Download data from the central project to the local project folder.
download_derivatives
([...])Download all files in the derivatives top level folder.
download_entire_project
([...])Download the entire project.
download_rawdata
([overwrite_existing_files, ...])Download all files in the rawdata top level folder.
download_specific_folder_or_file
(filepath[, ...])Download a specific file or folder.
Return the project central path.
Return the full path to the DataShuttle config file.
Return the datashuttle configs.
Return the path to the local datashuttle folder.
Return a list of existing project names found on the local machine.
Return the projects local path.
Return the path where datashuttle logs are written.
Return the regexp templates used for validation.
get_next_ses
(top_level_folder, sub[, ...])Return the next session number.
get_next_sub
(top_level_folder[, ...])Return the next subject number.
Return a bool indicating whether the project is 'local only'.
make_config_file
(local_path[, central_path, ...])Initialize the configurations for datashuttle on the local machine.
set_name_templates
(new_name_templates)Update the persistent settings with new name templates.
Set up a connection to the central server using SSH.
Print the current configs to the terminal.
update_config_file
(**kwargs)Update the configuration file.
upload_custom
(top_level_folder, sub_names, ...)Upload data from a local project to the central project folder.
upload_derivatives
([...])Upload all files in the derivatives top level folder.
upload_entire_project
([...])Upload the entire project.
upload_rawdata
([overwrite_existing_files, ...])Upload all files in the rawdata top level folder.
upload_specific_folder_or_file
(filepath[, ...])Upload a specific file or folder.
validate_project
(top_level_folder, display_mode)Perform validation on the project.
write_public_key
(filepath)Save the public SSH key to a specified filepath.
- create_folders(top_level_folder, sub_names, ses_names=None, datatype='', bypass_validation=False, log=True)[source]#
Create a folder tree in the project folder.
The passed names are initially formatted and validated, then folders are created.
- Parameters:
top_level_folder (
Literal
['rawdata'
,'derivatives'
]) – Whether to make the folders within rawdata or derivatives.sub_names (
Union
[str
,List
[str
]]) – subject name / list of subject names to make within the top-level project folder (if not already, these will be prefixed with “sub-“)ses_names (
Union
[str
,List
[str
],None
]) – session name / list of session names. (if not already, these will be prefixed with “ses-“). If no session is provided, no session-level folders are made.datatype (
Union
[str
,List
[str
]]) – The datatype to make in the sub / ses folders. (e.g. “ephys”, “behav”, “anat”). If “” is passed no datatype will be created. Broad or Narrow NeuroBlueprint datatypes are accepted.bypass_validation (
bool
) – If True, folders will be created even if they are not valid to NeuroBlueprint style.log (
bool
) – If True, details of folder creation will be logged.
- Returns:
A dictionary of the full filepaths made during folder creation, where the keys are the type of folder made and the values are a list of created folder paths (Path objects). If datatype were created, the dict keys will separate created folders by datatype name. Similarly, if only subject or session level folders were created, these are separated by “sub” and “ses” keys.
- Return type:
created_paths
Notes
sub_names or ses_names may contain formatting tags
- @TO@
used to make a range of subjects / sessions. Boundaries of the range must be either side of the tag e.g. sub-001@TO@003 will generate [“sub-001”, “sub-002”, “sub-003”]
- @DATE@, @TIME@ @DATETIME@
will add date-<value>, time-<value> or date-<value>_time-<value> keys respectively. Only one per-name is permitted. e.g. sub-001_@DATE@ will generate sub-001_date-20220101 (on the 1st january, 2022).
Examples
project.create_folders(“rawdata”, “sub-001”, datatype=”behav”)
project.create_folders(“rawdata”, “sub-002@TO@005”, [“ses-001”, “ses-002”], [“ephys”, “behav”])
- upload_custom(top_level_folder, sub_names, ses_names, datatype='all', overwrite_existing_files='never', dry_run=False, init_log=True)[source]#
Upload data from a local project to the central project folder.
- Parameters:
top_level_folder (
Literal
['rawdata'
,'derivatives'
]) – The top-level folder (e.g. “rawdata”, “derivatives”) to transfer within.sub_names (
Union
[str
,list
]) – A subject name / list of subject names. These must be prefixed with"sub-"
, or the prefix will be automatically added."@*@"
can be used as a wildcard. “all” will search for all sub-folders in the datatype folder to upload.ses_names (
Union
[str
,list
]) – A session name / list of session names, similar to sub_names but requiring a"ses-"
prefix.datatype (
Union
[List
[str
],str
]) – The (broad or narrow) NeuroBlueprint datatypes to transfer. If"all"
, any broad or narrow datatype folder will be transferred.overwrite_existing_files (
Literal
['never'
,'always'
,'if_source_newer'
]) – If"never"
files on target will never be overwritten by source. If"always"
files on target will be overwritten by source if there is any difference in date or size. If"if_source_newer"
files on target will only be overwritten by files on source with newer creation / modification datetime.dry_run (
bool
) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.init_log (
bool
) – Whether to handle logging. This should always beTrue
, unless logger is handled elsewhere (e.g. in a calling function).
- Return type:
None
- download_custom(top_level_folder, sub_names, ses_names, datatype='all', overwrite_existing_files='never', dry_run=False, init_log=True)[source]#
Download data from the central project to the local project folder.
- Parameters:
top_level_folder (
Literal
['rawdata'
,'derivatives'
]) – The top-level folder (e.g. “rawdata”, “derivatives”) to transfer within.sub_names (
Union
[str
,list
]) – A subject name / list of subject names. These must be prefixed with"sub-"
, or the prefix will be automatically added."@*@"
can be used as a wildcard. “all” will search for all sub-folders in the datatype folder to upload.ses_names (
Union
[str
,list
]) – A session name / list of session names, similar to sub_names but requiring a"ses-"
prefix.datatype (
Union
[List
[str
],str
]) – The (broad or narrow) NeuroBlueprint datatypes to transfer. If"all"
, any broad or narrow datatype folder will be transferred.overwrite_existing_files (
Literal
['never'
,'always'
,'if_source_newer'
]) – If"never"
files on target will never be overwritten by source. If"always"
files on target will be overwritten by source if there is any difference in date or size. If"if_source_newer"
files on target will only be overwritten by files on source with newer creation / modification datetime.dry_run (
bool
) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.init_log (
bool
) – Whether to handle logging. This should always beTrue
, unless logger is handled elsewhere (e.g. in a calling function).
- Return type:
None
- upload_rawdata(overwrite_existing_files='never', dry_run=False)[source]#
Upload all files in the rawdata top level folder.
- Parameters:
overwrite_existing_files (
Literal
['never'
,'always'
,'if_source_newer'
]) – If"never"
files on target will never be overwritten by source. If"always"
files on target will be overwritten by source if there is any difference in date or size. If"if_source_newer"
files on target will only be overwritten by files on source with newer creation / modification datetime.dry_run (
bool
) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.
- Return type:
None
- upload_derivatives(overwrite_existing_files='never', dry_run=False)[source]#
Upload all files in the derivatives top level folder.
- Parameters:
overwrite_existing_files (
Literal
['never'
,'always'
,'if_source_newer'
]) – If"never"
files on target will never be overwritten by source. If"always"
files on target will be overwritten by source if there is any difference in date or size. If"if_source_newer"
files on target will only be overwritten by files on source with newer creation / modification datetime.dry_run (
bool
) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.
- Return type:
None
- download_rawdata(overwrite_existing_files='never', dry_run=False)[source]#
Download all files in the rawdata top level folder.
- Parameters:
overwrite_existing_files (
Literal
['never'
,'always'
,'if_source_newer'
]) – If"never"
files on target will never be overwritten by source. If"always"
files on target will be overwritten by source if there is any difference in date or size. If"if_source_newer"
files on target will only be overwritten by files on source with newer creation / modification datetime.dry_run (
bool
) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved..
- Return type:
None
- download_derivatives(overwrite_existing_files='never', dry_run=False)[source]#
Download all files in the derivatives top level folder.
- Parameters:
overwrite_existing_files (
Literal
['never'
,'always'
,'if_source_newer'
]) – If"never"
files on target will never be overwritten by source. If"always"
files on target will be overwritten by source if there is any difference in date or size. If"if_source_newer"
files on target will only be overwritten by files on source with newer creation / modification datetime.dry_run (
bool
) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.
- Return type:
None
- upload_entire_project(overwrite_existing_files='never', dry_run=False)[source]#
Upload the entire project.
Includes every top level folder (e.g.
rawdata
,derivatives
).- Parameters:
overwrite_existing_files (
Literal
['never'
,'always'
,'if_source_newer'
]) – If"never"
files on target will never be overwritten by source. If"always"
files on target will be overwritten by source if there is any difference in date or size. If"if_source_newer"
files on target will only be overwritten by files on source with newer creation / modification datetime.dry_run (
bool
) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.
- Return type:
None
- download_entire_project(overwrite_existing_files='never', dry_run=False)[source]#
Download the entire project.
Includes every top level folder (e.g.
rawdata
,derivatives
).- Parameters:
overwrite_existing_files (
Literal
['never'
,'always'
,'if_source_newer'
]) – If"never"
files on target will never be overwritten by source. If"always"
files on target will be overwritten by source if there is any difference in date or size. If"if_source_newer"
files on target will only be overwritten by files on source with newer creation / modification datetime.dry_run (
bool
) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.
- Return type:
None
- upload_specific_folder_or_file(filepath, overwrite_existing_files='never', dry_run=False)[source]#
Upload a specific file or folder.
If transferring a single file, the path including the filename is required (see ‘filepath’ input). If a folder, wildcards “*” or “**” must be used to transfer all files in the folder (“*”) or all files and sub-folders (“**”).
- Parameters:
filepath (
Union
[str
,Path
]) – a string containing the full filepath.overwrite_existing_files (
Literal
['never'
,'always'
,'if_source_newer'
]) – If"never"
files on target will never be overwritten by source. If"always"
files on target will be overwritten by source if there is any difference in date or size. If"if_source_newer"
files on target will only be overwritten by files on source with newer creation / modification datetime.dry_run (
bool
) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.
- Return type:
None
- download_specific_folder_or_file(filepath, overwrite_existing_files='never', dry_run=False)[source]#
Download a specific file or folder.
If transferring a single file, the path including the filename is required (see ‘filepath’ input). If a folder, wildcards “*” or “**” must be used to transfer all files in the folder (“*”) or all files and sub-folders (“**”).
- Parameters:
filepath (
Union
[str
,Path
]) – a string containing the full filepath.overwrite_existing_files (
Literal
['never'
,'always'
,'if_source_newer'
]) – If"never"
files on target will never be overwritten by source. If"always"
files on target will be overwritten by source if there is any difference in date or size. If"if_source_newer"
files on target will only be overwritten by files on source with newer creation / modification datetime.dry_run (
bool
) – Perform a dry-run of transfer. This will output as if file transfer was taking place, but no files will be moved.
- Return type:
None
- setup_ssh_connection()[source]#
Set up a connection to the central server using SSH.
Assumes the central_host_id and central_host_username are set in configs (see make_config_file() and update_config_file()). First, the server key will be displayed, requiring verification of the server ID. This will store the hostkey for all future use.
Next, prompt to input their password for the central cluster. Once input, SSH private / public key pair will be setup.
- Return type:
None
- write_public_key(filepath)[source]#
Save the public SSH key to a specified filepath.
By default, only the SSH private key is stored in the datashuttle configs folder. Use this function to save the public key.
- Parameters:
filepath (
str
) – Full filepath (including filename) to write the public key to.- Return type:
None
- make_config_file(local_path, central_path=None, connection_method=None, central_host_id=None, central_host_username=None)[source]#
Initialize the configurations for datashuttle on the local machine.
Once initialised, these settings will be used each time the datashuttle is opened.
These settings are stored in a config file on the datashuttle path (not in the project folder) on the local machine. Use
get_config_path()
to get the full path to the saved config file.Use
update_config_file()
to selectively update settings.- Parameters:
local_path (
str
) – path to project folder on local machinecentral_path (
str
|None
) – Filepath to central project. If this is local (i.e.connection_method = "local_filesystem"
), this is the full path on the local filesystem Otherwise, if this is via ssh (i.e.connection method = "ssh"
), this is the path to the project folder on central machine. This should be a full path to central folder i.e. this cannot include ~ home folder syntax, must contain the full path (e.g./nfs/nhome/live/jziminski
)connection_method (
str
|None
) – The method used to connect to the central project filesystem, e.g."local_filesystem"
(e.g. mounted drive) or"ssh"
central_host_id (
Optional
[str
]) – server address for central host for ssh connection e.g."ssh.swc.ucl.ac.uk"
central_host_username (
Optional
[str
]) – username for which to log in to central host. e.g."jziminski"
- Return type:
None
- get_datashuttle_path()[source]#
Return the path to the local datashuttle folder.
This is where configs and other datashuttle files are stored.
- Return type:
Path
- static get_existing_projects()[source]#
Return a list of existing project names found on the local machine.
This is based on project folders in the “home / .datashuttle” folder that contain valid config.yaml files.
- Return type:
List
[Path
]
- get_next_sub(top_level_folder, return_with_prefix=True, include_central=False)[source]#
Return the next subject number.
- Parameters:
top_level_folder (
Literal
['rawdata'
,'derivatives'
]) – The top-level folder, “rawdata” or “derivatives”.return_with_prefix (
bool
) – If True, return the subject with the “sub-” prefix.include_central (
bool
) – If False, only get names from `local_path, otherwise from local_path and central_path. If in local-project mode, this flag is ignored.
- Return type:
The next subject ID.
- get_next_ses(top_level_folder, sub, return_with_prefix=True, include_central=False)[source]#
Return the next session number.
- Parameters:
top_level_folder (
Literal
['rawdata'
,'derivatives'
]) – The top-level folder, “rawdata” or “derivatives”.sub (
str
) – Name of the subject to find the next session of.return_with_prefix (
bool
) – If True, return with the “ses-” prefix.include_central (
bool
) – IfFalse
, only get names fromlocal_path
, otherwise fromlocal_path
andcentral_path
. If in local-project mode, this flag is ignored.
- Return type:
The next session ID.
- is_local_project()[source]#
Return a bool indicating whether the project is ‘local only’.
A project is ‘local-only’ if it has no
central_path
andconnection_method
. It can be used to make folders and validate, but not for transfer.- Return type:
bool
- get_name_templates()[source]#
Return the regexp templates used for validation.
If the “on” key is set to False, template validation is not performed.
- Returns:
e.g. {“name_templates”: {“on”: False, “sub”: None, “ses”: None}}
- Return type:
name_templates
- set_name_templates(new_name_templates)[source]#
Update the persistent settings with new name templates.
Name templates are regexp for that, when
name_templates["on"]
is set toTrue
,"sub"
and"ses"
names are validated against the regexp contained in the dict.- Parameters:
new_name_templates (
Dict
) – e.g.{"name_templates": {"on": False, "sub": None, "ses": None}}
where"sub"
or"ses"
can be a regexp that subject and session names respectively are validated against.- Return type:
None
- validate_project(top_level_folder, display_mode, include_central=False, strict_mode=False)[source]#
Perform validation on the project.
This checks the subject and session level folders to ensure there are no NeuroBlueprint formatting issues.
- Parameters:
top_level_folder (
Optional
[Literal
['rawdata'
,'derivatives'
]]) – Folder to check, either"rawdata"
or"derivatives"
. IfNone
, will check both folders.display_mode (
Literal
['error'
,'warn'
,'print'
]) – The validation issues are displayed as"error"
(raise error)"warn"
(show warning) or"print"
include_central (
bool
) – IfFalse
, only the local project is validated. Otherwise, both local and central projects are validated. If in local-project mode, this flag is ignored.strict_mode (
bool
) – IfTrue
, only allow NeuroBlueprint-formatted folders to exist in the project. By default, non-NeuroBlueprint folders (e.g. a folder called ‘my_stuff’ in the ‘rawdata’) are allowed, and only folders starting with sub- or ses- prefix are checked. InStrict Mode
, any folder not prefixed with sub-, ses- or a valid datatype will raise a validation issue.
- Returns:
A list of validation errors found in the project.
- Return type:
error_messages
- static check_name_formatting(names, prefix)[source]#
Format a list of subject or session names.
Pass list of names to check how these will be auto-formatted, for example as when passed to
create_folders()
orupload_custom()
Useful for checking tags e.g. @TO@, @DATE@, @DATETIME@, @DATE@. This method will print the formatted list of names.
- Parameters:
names (
Union
[str
,list
]) – A string or list of subject or session names.prefix (
Literal
['sub'
,'ses'
]) – The relevant subject or session prefix, e.g."sub-"
or"ses-"
- Return type:
None
- datashuttle.quick_validate_project(project_path, top_level_folder='rawdata', display_mode='warn', strict_mode=False, name_templates=None)[source]#
Perform validation on a NeuroBlueprint-formatted project.
- Parameters:
project_path (
str
|Path
) – Path to the project to validate. Must include the project name, and hold a “rawdata” or “derivatives” folder.top_level_folder (
Optional
[Literal
['rawdata'
,'derivatives'
]]) – The top-level folder (“rawdata” or “derivatives”) to perform validation. If None, both are checked.display_mode (
Literal
['error'
,'warn'
,'print'
]) – The validation issues are displayed as"error"
(raise error),"warn"
(show warning), or"print"
.strict_mode (
bool
) – IfTrue
, only allow NeuroBlueprint-formatted folders to exist in the project. By default, non-NeuroBlueprint folders (e.g. a folder called ‘my_stuff’ in the ‘rawdata’) are allowed, and only folders starting with sub- or ses- prefix are checked. In Strict Mode, any folder not prefixed with sub-, ses- or a valid datatype will raise a validation issue.name_templates (
Optional
[Dict
]) – A dictionary of templates for subject and session name to validate against. SeeDataShuttle.set_name_templates()
for details.
- Returns:
A list of validation errors found in the project.
- Return type:
error_messages