Matrix Assembly

class bsbolt.Matrix.AggregateMatrix(file_list=None, sample_list=None, min_site_coverage=10, site_proportion_threshold=0.9, output_path=None, cg_only=False, verbose=True, threads=1, count_matrix=False)

Aggregate CGMap files into combined methylation matrix. CGmap files are first iterated through to count sites and then iterated to through to retrieve values. While slower this prevents the construction of large sparse matrices. Assembly is multi-threaded to improve performance.

Params:

  • file_list (list): list of file CGmap files
  • sample_list (list): if passed, sample labels for CGmaps files, else labels taken from sample names
  • min_site_coverage (int): minimum read coverage for a CpG site to be considered for matrix, [10]
  • site_proportion_threshold (float): proportion of samples that must have valid non-null methylation calls for a site to be included in matrix, [0.9]
  • output_path (str): path to output file
  • cg_only (bool): consider all cytosines or only CpG sites, [False]
  • verbose (bool): verbose matrix assembly, [False]
  • threads (int): threads available for matrix aggregation, [1]

Attributes:

  • self.sample_list (list): list of samples as they are ordered in the methylation matrix
  • self.matrix_sites (tuple): tuple of ordered sites appearing in methylation matrix, only set if no output path is provided
  • self.meth_matrix (np.array): array of methylation values (rows) by sample (columns), only set if not output path is provided

Usage:

```python matrix = AggregateMatrix(**kwargs) matrix.aggregate_matrix()

# access samples matrix.sample_list # access site list matrix.matrix_sites # access methylation matrix matrix.meth_matrix

aggregate_matrix(self)

Iterate through passed CGmap files and aggregate matrix

assemble_matrix(self, matrix_sites)

Append sites to site list

collect_matrix_sites(self)

Iterate through individual files to get consensus site counts

get_output_object(self)
output_matrix(self, meth_matrix, matrix_sites)

Output sorted aggregated matrix