Dataset

class dataset.Dataset

Bases: object

The core api for dataset

add_derivative_data(source_path, subject, sample, copy=True, overwrite=True)

Add raw data of a sample to correct SDS location and update relavent metadata files. Requires you to already have the folder structure inplace.

Parameters:
  • source_path (string) – original location of raw data

  • subject (string) – subject id

  • sample (string) – sample id

  • sds_parent_dir (string, optional) – path to existing sds dataset parent

  • copy (bool, optional) – if True, source directory data will not be deleted after copying, defaults to True

  • overwrite (bool, optional) – if True, any data in the destination folder will be overwritten, defaults to False

Raises:

NotADirectoryError – if the derivative in sds_parent_dir is not a folder, this wil be raised.

add_subjects(subjects)

Add Subejct list to dataset. This function will add subjects and samples to metadata, And will move the sample files from origin source path to dataset primary subject sample folder. It will automatically update manifest and dataset_description metadata files.

Parameters:

subjects (list) – Subject dataset

add_thumbnail(source_path, copy=True, overwrite=True)
create_empty_dataset(version='2.0.0')

Create an empty dataset from template via dataset version :param version: the dataset version :type version: ‘2.0.0’ | ‘1.2.3’

delete_data(destination_path)

Delete file based on ,the file path in dataset It will automatically update mainfest metadata TODO: need to connect delete sample and subject, and update subject and sample metadata

Parameters:

destination_path (str) – the file path that you want to delete

Returns:

delete_sample(destination_path, data_type='primary')
Parameters:
  • destination_path – the sample folder path that you want to delete

  • data_type

Returns:

delete_samples(destination_paths, data_type='primary')
Parameters:
  • destination_paths (list) – a list of deleting sample folders

  • data_type (str) – “primary” | “derivative”

Returns:

delete_subject(destination_path, data_type='primary')
Parameters:
  • destination_path (str) – the subject folder path that you want to delete

  • data_type – “primary” | “derivative”

Type:

str

Returns:

delete_subjects(destination_paths, data_type='primary')
Parameters:
  • destination_paths (str[]) – the subject folder paths that you want to delete

  • data_type – “primary” | “derivative”

Type:

str

Returns:

get_dataset()
Returns:

current dataset dict

get_dataset_path()

Return the path to the dataset directory :return: path to the dataset directory :rtype: string

get_metadata(metadata_file)

Get a Metadata object based on the metadata file name To edit values for a metadata

Parameters:

metadata_file (string) – one of string of [code_description, code_parameters, dataset_description, manifest,performances, resources,samples, subjects,submission]

Returns:

give a metadata editor for a specific metadata

get_subject(subject_sds_id) Subject

Get a subject by subject sds id

Parameters:

subject_sds_id (str) – subject sds id

Returns:

Subject

list_elements(metadata_file, axis=0, version=None)

List field from a metadata file

Parameters:
  • metadata_file (string) – metadata metadata_file

  • axis (int) – If axis=0, column-based. list all column headers. i.e. the first row. If axis=1, row-based. list all row index. i.e. the first column in each row

  • version (string) – reference template version

Returns:

a list of fields

Return type:

list

list_metadata_files(version, print_list=True)

list all metadata_files based on the metadata files in the template dataset

Parameters:

version (string) – reference template version

Returns:

all metadata metadata_files

Return type:

list

load_dataset(dataset_path=None, from_template=False, version=None)

Load the input dataset into a dictionary

Parameters:
  • dataset_path (string) – path to the dataset

  • from_template (bool) – whether to load the dataset from a SPARC template

  • version (string) – dataset version

Returns:

loaded dataset

Return type:

dict

load_metadata(path)

Load & update a single metadata

Parameters:

path (string) – path to the metadata file

Returns:

metadata

Return type:

Pandas.DataFrame

remove_thumbnail(destination_path)

Delete a thumbnail from dataset Will automatically update manifest metadata.

Parameters:

destination_path (str) – The thumbnail path in dataset that you want to delete.

save(save_dir='', remove_empty=False, keep_style=False)

Save dataset

Parameters:
  • save_dir (string) – path to the dest dir

  • remove_empty (bool) – (optional) If True, remove rows which do not have values in the “Value” field

set_path(path)

Set the dataset path, and set the path to Sample and Subject Class

Parameters:

path (string) – path to the dataset directory

update_by_json(metadata_file, json_file)

Given json file, update metadata file :param metadata_file: metadata metadata_file/filename :type metadata_file: string :param json_file: path to metadata file in json :type json_file: string :return: :rtype: