Structure of the code¶
- mirtop/bam
- bam.py
read_bam
: reads BAM files with pysamtools and store in a key - value object
- filter.py
tune
: if option--clean
is on, filter according generic rulesclean_hits
: get the top hits
- bam.py
- mirtop/gff
- init.py wraps the convertion process to GFF3
- body.py
create
will create the line according GFF format established.read_gff_line
: Inside a for loop to read line of the file. It’ll return and structure key:value dictionary for each column.
- header.py generate header and read header section.
- check.py checks header and single lines to be valid according GFF format (NOT IMPLEMENTED)
- stats.py GFF stats counting number of isomiR, their total and average expression
- query.py accept SQlite queries after option -q “”
- convert.py
create_counts
table of counts- allow filtering by attribute
- allow collapse by miRNA/isomiR type
- filter.py, parse from query (NOT IMPLEMENTED)
- mirtop/mirna
- fasta.py:
read_precursor
fasta file: key - value
- realign.py:
hits
: class that defines hitsisomir
: class that defines each sequencecigar_correction
: function that use CIGAR to make sequence to miRNA alignemtread_id
andmake_id
: shorter ID for sequencesmake_cigar
: giving an alignment return the CIGAR of itreverse_complement
: return the reverse complement of a sequencealign
: uses biopython to align two sequences of the same sizeexpand_cigar
: from a 12M to MMMMMMMMMMMMcigar2snp
: from CIGAR code to list of changes with position and reference and target nts
- mapper.py:
read_gtf
file: map genomic miRNA position to precursos position, then it needs genomic position for the miRNA and the precursor. Return would be like {mirna: [start, end]}
- annotate.py:
annotate
: read isomiRs and populate all attributes related to isomiRs
- fasta.py:
- mirtop/importer:
- seqbuster.py
- prost.py
- srnabench.py
- isomirsea.py
- mirtop/exporter:
- isomirs.py: export file to match isomiRs BioC package.
- data/examples/
- check gff files: example of correct, invalid, warning GFF files
- check BAM file
- check mapping from genome position to precursor position, example of +/- strand. Using
mirtop/mirna/map.read_gtf
. - check clean option: sequence mapping to multiple precursors/mirna, get the best score. Using
mirtop/bam/filter.clean_hits
.
To add new sub-commands, modify the following:
- mirtop/lib/parse.py
- query: TODO
- transform: TODO
- create: TODO
- check: TODO