Index
bsbolt.Index.ProcessCutSites
(cut_format=None, cut_descriptor='-')Process cut format information and returns a dictionary with restriction sequence, reverse complement of restriction sequence and offset.
Params:
- cut_format (str): restriction enzyme recognition sequence, [C-CGG]
- cut_descriptor (str): character used to designate cut site, [-]
get_recognition_site_sequences
(self, recognition_site)Params:
- recognition_site (str): sequence of site
Returns:
- forward_recognition_sites (list): list of forward recognition sites
- reverse_recognition_sites (list): list of reverse recognition sites
get_site_offsets
(self, site)Given a restriction site return offset, or proper position based on where cut occurs in sequence
Params:
- site (str): recognition site sequence
Returns:
- forward_offest (int/bool): if cut descriptor int, else False
- reverse_offset (int/bool): if cut descriptor int, else False
process_cut_sites
(self)Process forward and reverse strand restriction sites
bsbolt.Index.RRBSBuild
(reference_file=None, genome_database=None, lower_bound=30, upper_bound=500, cut_format='C-CGG', block_size=None, ignore_alt=False)Format reference sequence inputs for processing by BWA. In silico digests reference sequence and return mappable regions that are within the fragment boundary. Fragments are relative to the restriction cut site if provided, or the complete restriction sequence is considered as part of the mappable fragment.:
Params:
- reference_file (str): path to reference file in fasta format
- genome_database (str): directory to output processed datafiles
- block_size (int): bwa indexing block size, increases indexing speed but increases memory consumption
- lower_bound (int): smallest mappable fragment size
- upper_bound (int): largest mappable fragment size
- cut_format (str): Comma separated list of restriction sites, - represent cut break
- ignore_alt (bool): ignore alt contigs when constructing alignment index
Usage:
index = RRBSBuild(**kwargs)
index.generate_rrbs_database()
generate_rrbs_database
(self)Wrapper for class functions to process and build mapping indices.
mask_contig
(self, contig_str, mappable_regions)Given a list of mappable regions, if cut site isn't designated merges mappable fragments, returns a string of DNA sequence with masked unmappable regions
Params:
- contig_id (str): contig label
- contig_str (str): str of DNA sequence
- mappable_regions (list): list of mappable regions
Returns:
- masked_contig_sequence (str): str of DNA sequence with un-mappable regions masked
process_contig_region
(self, contig_id, contig_sequence)Given a contig_id will output a pickle file of the whole sequence and output a masked version of the the sequence where only mappable regions are reported.
Params:
- contig_id (str): contig label
- contig_sequence (list): a list of of string containing DNA Sequence
process_rrbs_sequence
(self, contig_str)Designate mappable regions by finding all occurrences of the restriction site string in the passed DNA sequence. Merge restriction map into regions by considering pairs of downstream and upstream restriction sites that pass the size limits.
Params:
- contig_str (str): STR of continuous DNA sequence
Returns:
- mappable_regions (list): List of tuples the contain the start and end position of fragments that are with the size limits
bsbolt.Index.WholeGenomeBuild
(reference_file=None, genome_database=None, mappable_regions=None, block_size=None, ignore_alt=False)Class to build whole genome bisulfite bwa index.
Params:
- reference_file (str): path to reference file in fasta format
- genome_database (str): path to genome database output
- mappable_regions (str): path to bed file of mappable regions for masked alignment index building
- block_size (int): bwa indexing block size, increases indexing speed but increases memory consumption
Usage:
index = WholeGenomeBuild(**kwargs)
index.generate_bsb_database()
generate_bsb_database
(self)Wrapper for class functions to process and build mapping indices.
get_mappable_regions
(bed_file)Get mappable regions if masking with bed file
Params:
- bed_file (str): path to bed file of mappable regions
Returns:
- mappable_regions (list): sorted list of mappable regions
mask_contig
(self, contig_id, contig_str)Mask contig sequence outside mappable regions
Params:
- contig_id (str) contig label
- contig_str (str): contig sequence
Returns:
- contig_str (str): masked sequence
process_contig
(self, contig_id, contig_str)Process contig sequence