Advanced usage¶
capC-MAP is actually a suit of programs written in C++ along
with a Python “front end” which allows a whole processing pipeline to be
run via a single command line. For advanced usage each of the component
programs can be run independently, and these are documented here. As well as the four core capC-MAP programs, there is a further additional tool capClocation2fragment
.
capCdigestfastq¶
The program capCdigestfastq
performs a in silico restriction enzyme
digestion of a pair of fastq sequence files. The program can run in two
modes. In the standard mode, as used in the capC-MAP pipeline, the
program reads from the two paired end read fastq files line-by-line,
checking that the names match for each of the pair. The program splits
the read pair into smaller restriction enzyme fragments at the specified
cutting sequence. As well as the cut sequence, a cut position within that
sequence has to be specified; this does not have to match the real cut
position of the enzyme and will not affect downstream capC-MAP results. All
fragments are output to a single fastq file, with
read pair names given in a format suitable for use the the capC-MAP
program capCmain
, once the fastq has been mapped to the reference
genome.
The program can also run in an alternative “long” mode, where only the
longest restriction enzyme fragment from each of the pair is retained,
and output is given in two separate fastq files. This output is not
suitable for use with the capCmain
program.
capCmain¶
The capCmain
is the main work-horse program of capC-MAP, and takes
as an input a name-sorted SAM file generated using bowtie to map a fastq
file which was generated by the digestfastq
program. It also requires a map of restriction enzyme fragments for the reference genome (as generated by capC-MAP genomedigest
), and bed file containing a list of target restriction enzyme fragments. The output is a list of intrachromosomal interactions and a list of interchromosomal interactions for each target.
capCpair2bg¶
The capCpair2bg
program reads in a single bed file list of intrachromosomal interactions (as output by capCmain
), and generates a “pile-up” of interaction counts at each restriction enzyme fragment in bedGraph format - i.e. an interaction profile.
capCpileup2binned¶
The capCpileup2binned
program reads in a restriction enzyme fragment level intrachromosomal interaction profile (as generated by capCpair2bg
) and generates a binned, smoothed, and normalized interaction profile. Optional parameters are a binning step size S
and windows size W
(as detailed in section capC-MAP run), and a total number of reads T
for normalization to reads-per-million. If only S
and W
are specified the profile is not normalized; if only T
is provided, a normalized profile at restriction enzyme fragment resolution is generated.
capClocation2fragment¶
The capClocation2fragment
program reads in a bed file of genome intervals, and a genome wide map of restriction enzyme fragments (as generated by capC-MAP genomedigest
). It finds the mid-point of each interval in the input file, and outputs the restriction enzyme fragment which it falls in. This can be useful for generating the targets file required by capC-MAP, where each target must appear in the genome wide fragments map.
A typical procedure to generate the targets file might be to
- Run the capture oligo sequences through BLAST to find their locations within the reference genome.
- Format the resulting list of locations into a bed file. Run this file though
capClocation2fragment
to find the list of restriction enzyme fragments to which the oligos map. - Edit the output bed file to add useful target names and remove duplicated entries (typically oligos will be designed such that there is one at either end of a restriction enzyme fragment).