Imputation

class bsbolt.Impute.ImputeMissingValues(input_matrix_file=None, batch_size=None, imputation_window_size=3000000, k=5, threads=4, verbose=False, sep='\t', output_path=None, randomize_batch=False, meth_matrix=None, meth_site_order=None, sample_ids=None)

Launch and knn imputation task. This wrapper imports data for imputation, split data for batch imputation, and combines data after imputation. Data is held in memory for access during imputation. If closest neighbors null, null imputed value returned.

Params:

  • input_matrix_file (str): Path to bsbolt matrix
  • batch_size (int): Batch size for batch imputation
  • imputation_window_size (int): Size (bp) for imputation window, [3,000,000]
  • k (int): Nearest neighbors used for imputation, [5]
  • threads (int): Number of threads available for imputation, [1]
  • verbose (bool): Verbose imputation, [False]
  • sep (str): separator character used in methylation matrix, [ ]
  • output_path (str): output path
  • randomize_batch (bool): randomize batch, [False]
  • meth_matrix (np.ndarray): imputed methylation matrix
  • meth_site_order (list): ordered methylation sites, sorted by contig then position
  • sample_ids (list): sample names
get_batch_data(self, batch)

Return methylation value for batch imputation

Returns:

  • batch_array (np.ndarry): array of methylation values
  • sample_labels (list): list of samples in batch
get_output_matrix(output_path)

Get output object

import_matrix(self)
impute_values(self)

Launch kNN imputation for each batch and set values in original matrix.

launch_genome_imputation(self, meth_array, sample_labels)
output_imputed_matrix(self)

Write imputed values

process_batch(self, imputation_order)

Generate sample batches