Index

class bsbolt.Index.ProcessCutSites(cut_format=None, cut_descriptor='-')

Process cut format information and returns a dictionary with restriction sequence, reverse complement of restriction sequence and offset.

Params:

  • cut_format (str): restriction enzyme recognition sequence, [C-CGG]
  • cut_descriptor (str): character used to designate cut site, [-]
get_recognition_site_sequences(self, recognition_site)

Params:

  • recognition_site (str): sequence of site

Returns:

  • forward_recognition_sites (list): list of forward recognition sites
  • reverse_recognition_sites (list): list of reverse recognition sites
get_site_offsets(self, site)

Given a restriction site return offset, or proper position based on where cut occurs in sequence

Params:

  • site (str): recognition site sequence

Returns:

  • forward_offest (int/bool): if cut descriptor int, else False
  • reverse_offset (int/bool): if cut descriptor int, else False
process_cut_sites(self)

Process forward and reverse strand restriction sites

class bsbolt.Index.RRBSBuild(reference_file=None, genome_database=None, lower_bound=30, upper_bound=500, cut_format='C-CGG', block_size=None, ignore_alt=False)

Format reference sequence inputs for processing by BWA. In silico digests reference sequence and return mappable regions that are within the fragment boundary. Fragments are relative to the restriction cut site if provided, or the complete restriction sequence is considered as part of the mappable fragment.:

Params:

  • reference_file (str): path to reference file in fasta format
  • genome_database (str): directory to output processed datafiles
  • block_size (int): bwa indexing block size, increases indexing speed but increases memory consumption
  • lower_bound (int): smallest mappable fragment size
  • upper_bound (int): largest mappable fragment size
  • cut_format (str): Comma separated list of restriction sites, - represent cut break
  • ignore_alt (bool): ignore alt contigs when constructing alignment index

Usage:

index = RRBSBuild(**kwargs)
index.generate_rrbs_database()
generate_rrbs_database(self)

Wrapper for class functions to process and build mapping indices.

mask_contig(self, contig_str, mappable_regions)

Given a list of mappable regions, if cut site isn't designated merges mappable fragments, returns a string of DNA sequence with masked unmappable regions

Params:

  • contig_id (str): contig label
  • contig_str (str): str of DNA sequence
  • mappable_regions (list): list of mappable regions

Returns:

  • masked_contig_sequence (str): str of DNA sequence with un-mappable regions masked
process_contig_region(self, contig_id, contig_sequence)

Given a contig_id will output a pickle file of the whole sequence and output a masked version of the the sequence where only mappable regions are reported.

Params:

  • contig_id (str): contig label
  • contig_sequence (list): a list of of string containing DNA Sequence
process_rrbs_sequence(self, contig_str)

Designate mappable regions by finding all occurrences of the restriction site string in the passed DNA sequence. Merge restriction map into regions by considering pairs of downstream and upstream restriction sites that pass the size limits.

Params:

  • contig_str (str): STR of continuous DNA sequence

Returns:

  • mappable_regions (list): List of tuples the contain the start and end position of fragments that are with the size limits
class bsbolt.Index.WholeGenomeBuild(reference_file=None, genome_database=None, mappable_regions=None, block_size=None, ignore_alt=False)

Class to build whole genome bisulfite bwa index.

Params:

  • reference_file (str): path to reference file in fasta format
  • genome_database (str): path to genome database output
  • mappable_regions (str): path to bed file of mappable regions for masked alignment index building
  • block_size (int): bwa indexing block size, increases indexing speed but increases memory consumption

Usage:

index = WholeGenomeBuild(**kwargs)
index.generate_bsb_database()
generate_bsb_database(self)

Wrapper for class functions to process and build mapping indices.

get_mappable_regions(bed_file)

Get mappable regions if masking with bed file

Params:

  • bed_file (str): path to bed file of mappable regions

Returns:

  • mappable_regions (list): sorted list of mappable regions
mask_contig(self, contig_id, contig_str)

Mask contig sequence outside mappable regions

Params:

  • contig_id (str) contig label
  • contig_str (str): contig sequence

Returns:

  • contig_str (str): masked sequence
process_contig(self, contig_id, contig_str)

Process contig sequence