Structure of the code

  • mirtop/bam
    • bam.py
      • read_bam: reads BAM files with pysamtools and store in a key - value object
    • filter.py
      • tune: if option --clean is on, filter according generic rules
      • clean_hits: get the top hits
  • mirtop/gff
    • init.py wraps the convertion process to GFF3
    • body.py create will create the line according GFF format established.
      • read_gff_line: Inside a for loop to read line of the file. It’ll return and structure key:value dictionary for each column.
    • header.py generate header and read header section.
    • check.py checks header and single lines to be valid according GFF format (NOT IMPLEMENTED)
    • stats.py GFF stats counting number of isomiR, their total and average expression
    • query.py accept SQlite queries after option -q “”
    • convert.py
      • create_counts table of counts
      • allow filtering by attribute
      • allow collapse by miRNA/isomiR type
    • filter.py, parse from query (NOT IMPLEMENTED)
  • mirtop/mirna
    • fasta.py:
      • read_precursor fasta file: key - value
    • realign.py:
      • hits: class that defines hits
      • isomir: class that defines each sequence
      • cigar_correction: function that use CIGAR to make sequence to miRNA alignemt
      • read_id and make_id: shorter ID for sequences
      • make_cigar: giving an alignment return the CIGAR of it
      • reverse_complement: return the reverse complement of a sequence
      • align: uses biopython to align two sequences of the same size
      • expand_cigar: from a 12M to MMMMMMMMMMMM
      • cigar2snp: from CIGAR code to list of changes with position and reference and target nts
    • mapper.py:
      • read_gtf file: map genomic miRNA position to precursos position, then it needs genomic position for the miRNA and the precursor. Return would be like {mirna: [start, end]}
    • annotate.py:
      • annotate: read isomiRs and populate all attributes related to isomiRs
  • mirtop/importer:
    • seqbuster.py
    • prost.py
    • srnabench.py
    • isomirsea.py
  • mirtop/exporter:
  • data/examples/
    • check gff files: example of correct, invalid, warning GFF files
    • check BAM file
    • check mapping from genome position to precursor position, example of +/- strand. Using mirtop/mirna/map.read_gtf.
    • check clean option: sequence mapping to multiple precursors/mirna, get the best score. Using mirtop/bam/filter.clean_hits.

To add new sub-commands, modify the following:

  • mirtop/lib/parse.py
    • query: TODO
    • transform: TODO
    • create: TODO
    • check: TODO