Dataset
- class dataset.Dataset
Bases:
objectThe core api for dataset
- add_derivative_data(source_path, subject, sample, copy=True, overwrite=True)
Add raw data of a sample to correct SDS location and update relavent metadata files. Requires you to already have the folder structure inplace.
- Parameters:
source_path (string) – original location of raw data
subject (string) – subject id
sample (string) – sample id
sds_parent_dir (string, optional) – path to existing sds dataset parent
copy (bool, optional) – if True, source directory data will not be deleted after copying, defaults to True
overwrite (bool, optional) – if True, any data in the destination folder will be overwritten, defaults to False
- Raises:
NotADirectoryError – if the derivative in sds_parent_dir is not a folder, this wil be raised.
- add_subjects(subjects)
Add Subejct list to dataset. This function will add subjects and samples to metadata, And will move the sample files from origin source path to dataset primary subject sample folder. It will automatically update manifest and dataset_description metadata files.
- Parameters:
subjects (list) – Subject dataset
- add_thumbnail(source_path, copy=True, overwrite=True)
- create_empty_dataset(version='2.0.0')
Create an empty dataset from template via dataset version :param version: the dataset version :type version: ‘2.0.0’ | ‘1.2.3’
- delete_data(destination_path)
Delete file based on ,the file path in dataset It will automatically update mainfest metadata TODO: need to connect delete sample and subject, and update subject and sample metadata
- Parameters:
destination_path (str) – the file path that you want to delete
- Returns:
- delete_sample(destination_path, data_type='primary')
- Parameters:
destination_path – the sample folder path that you want to delete
data_type –
- Returns:
- delete_samples(destination_paths, data_type='primary')
- Parameters:
destination_paths (list) – a list of deleting sample folders
data_type (str) – “primary” | “derivative”
- Returns:
- delete_subject(destination_path, data_type='primary')
- Parameters:
destination_path (str) – the subject folder path that you want to delete
data_type – “primary” | “derivative”
- Type:
str
- Returns:
- delete_subjects(destination_paths, data_type='primary')
- Parameters:
destination_paths (str[]) – the subject folder paths that you want to delete
data_type – “primary” | “derivative”
- Type:
str
- Returns:
- get_dataset()
- Returns:
current dataset dict
- get_dataset_path()
Return the path to the dataset directory :return: path to the dataset directory :rtype: string
- get_metadata(metadata_file)
Get a Metadata object based on the metadata file name To edit values for a metadata
- Parameters:
metadata_file (string) – one of string of [code_description, code_parameters, dataset_description, manifest,performances, resources,samples, subjects,submission]
- Returns:
give a metadata editor for a specific metadata
- get_subject(subject_sds_id) Subject
Get a subject by subject sds id
- Parameters:
subject_sds_id (str) – subject sds id
- Returns:
Subject
- list_elements(metadata_file, axis=0, version=None)
List field from a metadata file
- Parameters:
metadata_file (string) – metadata metadata_file
axis (int) – If axis=0, column-based. list all column headers. i.e. the first row. If axis=1, row-based. list all row index. i.e. the first column in each row
version (string) – reference template version
- Returns:
a list of fields
- Return type:
list
- list_metadata_files(version, print_list=True)
list all metadata_files based on the metadata files in the template dataset
- Parameters:
version (string) – reference template version
- Returns:
all metadata metadata_files
- Return type:
list
- load_dataset(dataset_path=None, from_template=False, version=None)
Load the input dataset into a dictionary
- Parameters:
dataset_path (string) – path to the dataset
from_template (bool) – whether to load the dataset from a SPARC template
version (string) – dataset version
- Returns:
loaded dataset
- Return type:
dict
- load_metadata(path)
Load & update a single metadata
- Parameters:
path (string) – path to the metadata file
- Returns:
metadata
- Return type:
Pandas.DataFrame
- remove_thumbnail(destination_path)
Delete a thumbnail from dataset Will automatically update manifest metadata.
- Parameters:
destination_path (str) – The thumbnail path in dataset that you want to delete.
- save(save_dir='', remove_empty=False, keep_style=False)
Save dataset
- Parameters:
save_dir (string) – path to the dest dir
remove_empty (bool) – (optional) If True, remove rows which do not have values in the “Value” field
- set_path(path)
Set the dataset path, and set the path to Sample and Subject Class
- Parameters:
path (string) – path to the dataset directory
- update_by_json(metadata_file, json_file)
Given json file, update metadata file :param metadata_file: metadata metadata_file/filename :type metadata_file: string :param json_file: path to metadata file in json :type json_file: string :return: :rtype: